In the first two posts (post-1, post-2) of this series, we built the biological and technical foundations of the project. We started by revisiting the role of telomerase in cancer and why its reactivation is central to unlimited cell division. We then assembled a reproducible multimodal dataset combining RNA-seq–derived telomerase labels with whole-slide histology images (WSIs) from TCGA, and showed that a simple image-based baseline already captures meaningful signal.In this third post, we take the next conceptual step: instead of treating each slide as a single global object, we attempt to learn from the thousands of image patches that compose a WSI. Our original motivation is straightforward: if telomerase activity leaves a morphological footprint, where in the tissue does this signal come from?To answer this, we turn to Attention-Based Multiple-Instance Learning (ABMIL), a framework designed precisely for weakly supervised problems like digital pathology. Along the way, however, we encounter an unexpected and instructive result: a simple global aggregation model outperforms ABMIL. Far from being a failure, this outcome teaches us something important about the biology of telomerase and the structure of the learning problem.Whole-slide images are enormous. A single WSI can contain hundreds of thousands of patches, each capturing a tiny region of tissue. Yet our supervision signal—TERT expression measured by RNA-seq—exists only at the slide (or patient) level. We do not know which patches are relevant, only that somewhere in the slide there may be visual correlates of telomerase activity.This mismatch between many local inputs and one global label is a textbook case for Multiple-Instance Learning (MIL):In classical MIL settings, a positive bag is assumed to contain at least one informative instance. In pathology, this might correspond to a small tumor focus or a rare cell population.ABMIL extends this idea by replacing hard instance selection with a learned attention mechanism. Instead of deciding which patch matters in advance, the model learns a weight for each patch and computes a weighted average of patch embeddings to form a slide-level representation.Conceptually, the pipeline looks like this:This architecture is appealing for two reasons:At this stage, ABMIL seems perfectly suited to our goal of localizing telomerase-related morphology.When we trained ABMIL on the full dataset, several challenges emerged.First, overfitting appeared quickly. Training loss decreased steadily, but test loss often increased, and performance metrics plateaued. This was mitigated through a combination of:Second, we observed that not all patches are equally informative. Many patches correspond to background or visually homogeneous tissue. To improve the signal-to-noise ratio, we introduced a simple heuristic: select patches with high embedding variance, then sample from this subset during training. This stabilised training and improved performance.With these refinements, ABMIL reached respectable results, confirming that the model was indeed extracting meaningful signal from histology.At the end of Post 2, we introduced a baseline method: mean pooling of patch embeddings, followed by a standard multilayer perceptron. Surprisingly, this simple approach consistently outperformed ABMIL:This result forces us to pause and reassess our assumptions.The answer lies in the biological nature of telomerase activity and the type of supervision available.Telomerase activation is not a rare, focal event like a micrometastasis. It is tightly linked to global tumor properties such as:These features tend to be distributed across large regions of the tumor, not confined to a small subset of patches. As a result, many patches carry weak but consistent information related to telomerase status.Mean pooling aggregates this diffuse signal effectively:Our labels are derived from RNA-seq and thresholded at the cohort level. They are inherently noisy and provide no spatial guidance. Under such weak supervision, ABMIL is forced to assign meaning to individual patches without ground truth, increasing the risk of focusing on spurious correlations.Mean pooling makes a much weaker assumption: all patches contribute partial evidence. In this context, that assumption turns out to be better aligned with reality.Finally, the mean-pooling model is simply easier to train. It operates on one vector per slide, has smoother gradients, and avoids the instability introduced by attention normalization over thousands of instances. When the signal is global, this simplicity becomes an advantage rather than a limitation.The takeaway is not that ABMIL is ineffective, but that the problem itself is not strongly instance-driven. Telomerase-associated morphology appears to be a global tissue phenotype, and global aggregation provides a better inductive bias.ABMIL remains valuable for:But for the core prediction task, mean pooling captures the dominant signal more efficiently.Lessons LearnedNot every WSI problem is a multiple-instance problem. Although WSIs are composed of many patches, the predictive signal may still be global rather than localized.Biology determines the right inductive bias. Telomerase activity reflects widespread tumor properties (proliferation, cellularity, nuclear atypia), making global aggregation more effective than patch selection.Weak supervision favors simpler aggregation. When labels come from slide-level RNA measurements, enforcing instance-level explanations can amplify noise instead of reducing it.Attention is a tool, not a guarantee. Attention-based MIL provides interpretability and is powerful for focal phenomena, but it is not universally superior to simpler pooling strategies.Strong baselines are essential. Comparing against a simple mean-pooling model was crucial to correctly interpret ABMIL’s behavior and avoid overfitting the modeling approach to the data.Across these three posts, we set out to explore whether telomerase activity—a fundamentally molecular property—leaves a detectable imprint in tumor histology.We began by grounding the problem in cancer biology, then built a multimodal dataset linking RNA-seq–derived telomerase labels to whole-slide images. Finally, we explored increasingly structured models, from global pooling to attention-based multiple-instance learning.The most important result is not a particular metric, but a conceptual one: telomerase-associated morphology appears to be predominantly global rather than focal. In this regime, simple aggregation strategies outperform more complex instance-selection mechanisms, especially under weak supervision.This outcome reinforces a broader lesson in applied machine learning for biology: architectural sophistication must follow biological structure, not precede it. Attention-based models remain invaluable tools—but only when the underlying signal truly demands them.With this, we close the series having not only improved performance, but also gained a clearer understanding of how telomerase activity manifests in histological space.AI for breakthroughs, not buzzwords.© 2026 Barnacle Labs Ltd.