Training AI to accurately identify biomarkers in tumor biospecimens

Guest editorial by Courtney Noah, Ph.D., and Daryl Waggott, MSc

24 Sept 2025

Editorial article

Abstract image of swirly blue lines and yellow dots to represent AI — In this guest editorial, learn how incorporating AI algorithms into pathology workflows is enabling the detection of tumors and tumor subtypes.

Digital pathology is a rapidly growing field. Key to this growth has been the development of Whole Slide Imaging (WSI), the ability to scan traditional pathology biospecimens on glass slides and create high quality digital images which are easier to store, share with other medical professionals, and analyze using AI.

WSI paired with AI and machine learning models have opened new doors for image analysis leading to innovations in biomarker identification and diagnostic development. WSIs are essential for developing AI solutions in computational pathology as they provide a digital, comprehensive representation of tissue architecture. These enable AI models to determine genetic mutations from structural changes present in the tissue, leveraging the fundamental principle that genetic alterations drive corresponding morphological transformations. That is particularly important when training AI algorithms to analyze tissue biospecimens and increase their diagnostic accuracy.

Developing AI models for digital pathology requires meticulously collected biospecimens and curated, standardized data. They need high quality biospecimens that were processed, stored, and sectioned following the same clinical protocol. The biospecimens must also be supplied with the appropriate patient demographic and clinical data, and molecular data to identify the mutations and biomarkers present.

Biological diversity

With oncology cases, WSI provides a complete view across the tumor, adjacent normal tissue, and the edges and artifacts, revealing true tissue diversity and variability. It is important to address heterogeneity, not only within the WSI scan, but also across the entire tissue block. One side of the tumor can be quite different from the other. These data provide AI with the requisite edge cases and context to broaden its knowledge.

Technical diversity

AI models also need diversity in terms of the quality of the samples on which they are trained. It’s critical that there is proper representation of the real-world scenario which may include samples with artifacts, low tumor cellularity and/or edge cases. As the AI is learning, it needs to recognize that not every sample will look perfect, but it still needs show robustness in delivering the right result. Researchers do not only want 40X images in the training set, but also 20X and 10X images as well. Capturing both biological and technical diversity is important.

Breast cancer case study

Molecular profiling of the estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2) in malignant breast tumors is used to determine how the cancer is developing and to choose the best therapy for the patient.The status of those molecular biomarkers is typically analyzed using immunohistochemistry (IHC) and in-situ hybridization, tests that are both expensive and time-consuming to perform in a laboratory and for a pathologist to interpret.²

Panakeia, a pioneer in AI-driven multi-omic biomarker profiling, has developed an AI-driven software platform which can provide comprehensive multi-omic information (DNA, RNA, protein, metabolites) from routinely used images of tissue samples.³ Their PANProfiler Breast solution uses AI to assess the ER, PR, and HER2 status of breast adenocarcinoma by analyzing digital images of hematoxylin and eosin (H&E) stained biopsy or resection slides.

Panakeia partnered with BioIVT to obtain a proprietary dataset that it could use to train its AI models on biomarker identification. Combined with other public and proprietary datasets, the company developed and validated AI models for detecting molecular biomarkers in breast cancer.²

BioIVT’s dataset consisted of more than 1,000 scanned slides from formalin-fixed, paraffin-embedded, core breast biopsy and resection sections. All the slide images were well-characterized and reviewed by a BioIVT pathologist to confirm that they portrayed the clinical diagnosis. The images were delivered together with the patient’s ER, PR, and HER2 status, their demographic and clinical information, and any follow-up data.

Since its initial development and validation using BioIVT data, subsequent iterations of PANProfiler Breast (ER, PR, HER2) have achieved superior performance following multi-site validation.More specifically, PANProfiler Breast displays ER sensitivity 98.2%/specificity 62.0%, PR sensitivity 97.9%/specificity 45.7%, and HER2 sensitivity 90.6%/specificity 100.0%.⁴ These advancements enabled the company to secure UKCA and CE (IVDD) marks for their in vitro medical devices that can provide pathologists with quick, accurate, reproducible patient biomarker profiles.

This case study clearly demonstrates how WSI, with associated clinical and genomic data, can be leveraged to develop digital pathology AI algorithms.

Dynamic dataset

BioIVT has an extensive donor network (11 donor centers and 425 clinical sites globally), which allows it to continually acquire new biospecimens. This growing biospecimen inventory will allow it to adapt to address new and emerging scientific challenges and opportunities that AI will uncover. Many AI products and projects fail because they have a static dataset that cannot accommodate requisite changes when the researchers realize they need to head in a different direction or pivot slightly.

Benefits of AI

Incorporating AI algorithms into pathology workflows is enabling the detection of tumors and tumor subtypes, identification of novel morphological structures, and analysis of quantitative biomarkers, thus supporting precision medicine.

While AI-assisted insights are valuable, they are used to support and not replace pathologists’ insights. Pathologists remain the ultimate diagnostic decision-makers.

Access BioIVT’s Board-Certified Pathologist Verified Tissue Samples

Learn More About Visionaire™ Biospecimen Datasets

Author biographies

Dr. Courtney Noah is BioIVT's Vice President of Scientific Affairs. She leads a team that provides solutions for BioIVT’s clients and business partners. Dr. Noah received her Ph.D. in Molecular and Cellular Biology from Stony Brook University, and her BS in Food Science from Cornell University.

Daryl Waggott is Director - Biologics, Data Products at BioIVT. Daryl’s areas of expertise span genomics, digital health, and AI-driven physiology. He helps BioIVT to develop new data products including longitudinal disease collections, and regulatory specific datasets and services.

References

1. Kiran N, Sapna F, Kiran F, Kumar D, Raja F, Shiwlani S, Paladini A, Sonam F, Bendari A, Perkash RS, Anjali F, Varrassi G. Digital Pathology: Transforming Diagnosis in the Digital Age. Cureus. 2023 Sep 3;15(9):e44620. doi: 10.7759/cureus.44620. PMID: 37799211; PMCID: PMC10547926.

2. Salim Arslan, Xiusi Li, Julian Schmidt, Julius Hense, Andre Geraldes, Cher Bass, Keelan Brown, Angelica Marcia, Tim Dewhirst, Pahini Pandya, Shikha Singhal, Debapriya Mehrotra, Pandu Raharja-Liu. Evaluation of a predictive method for the H&E-based molecular profiling of breast cancer with deep learning. bioRxiv 2022.01.04.474882; doi: https://doi.org/10.1101/2022.01.04.474882

3. Arslan, S., Schmidt, J., Bass, C. et al. A systematic pan-cancer study on deep learning-based prediction of multi-omic biomarkers from routine pathology images. Commun Med 4, 48 (2024); doi: https://doi.org/10.1038/s43856-024-00471-5

4. S. Wolf, C. Bass, F. Ntelemis, A. Geraldes, J. Schmidt, D. Mehrotra, S. Singhal, N. Kumar, J. Blackwood, L. Spary, D. Leff, I.H. Um, J. Loane, A. Khurram, G. Bryson, R. Clarkson, D.J. Harrison, J.N. Kather, P. Pandya, S. Arslan. A large-scale multi-site validation of a deep learning approach for determining ER, PR, and HER2 status from H&E-stained breast cancer slides. ESMO Open, Volume 10, Supplement 4, 2025, 104608; doi: https://doi.org/10.1016/j.esmoop.2025.104608

Related products

Request Quote for All Products

Visionaire Biospecimen and Clinical Data

BioIVT

Access 30+ years of BioIVT’s vast biobank collections’ clinical data, biospecimen imaging and biomarker findings. Delve into AI-ready datasets for oncology, neurology and more that meet regulatory requirements, ethical standards and privacy laws.

(0)

Clinically-Collected Specimens

BioIVT

BioIVT has the expertise in the processes required for ethical acquisition of biospecimens that cover a wide range of disease states and the protocols needed to produce reliable data. With over 425 IRB-approved clinical sites, BioIVT provides clinical specimens for nearly all disease states.

(0)

Links

BioIVT Company website