Abstract
Background: Accurate classification of hematologic malignancies remains challenging due to overlapping morphological, immunophenotypic, genomic, and immunological features. Shared mutational patterns among subtypes of lymphoid and myeloid malignancies further complicate accurate molecular classification. Additionally, distinguishing between clonal driver mutations, subclonal events, and clonal hematopoiesis of indeterminate potential (CHIP) complicates interpretation of genomic profiles, often requiring extensive expert input and delaying definitive diagnoses. Developing robust and interpretable machine learning (ML) frameworks capable of integrating genomic and immune repertoire features will greatly benefit efforts to accelerate and standardize diagnostics.
Methods: We curated a cohort of 11,090 specimens from 7,543 patients profiled using the Stanford Actionable Mutation Panel for Hematopoietic and Lymphoid Neoplasms (Heme-STAMP) between Oct. 2020 and Jun. 2025. Heme-STAMP is a targeted NGS panel curated for clinical relevance in myeloid and lymphoid malignancies and employing deep sequencing (~2000X coverage). Cases were initially annotated with 138 detailed diagnostic categories, and harmonized into 15 major malignant classes. Around 30% of cases lacked an established diagnosis at the time of testing (i.e., sequencing from diagnostic biopsy in parallel with pathology review) and therefore were excluded. Lastly, samples without clinical suspicion of malignancy comprised the benign class.
Genomic features used for the classifier training encompassed somatic mutations, canonical fusions/translocations, and focal copy number alterations (CNAs). To prioritize truncal mutations, we introduced an allele frequency-rank feature, capturing clonal dominance within each sample. TCR and IG sequencing data from 1,001 samples were used to derive immune repertoire features. Finally, we developed a comprehensive suite of ML models as well as advanced neural networks. We coupled this with an explainable AI framework, generating transparent, rule-based rationales alongside predictions, and called this framework Heme-xAIGen.
Results: After excluding samples labeled as unknown malignancy, the training (80%) and validation (20%) cohorts comprised diverse malignancies, including AML (15.6%), MPN (15.5%), PCNs (11.3%), CLL/SLL (8.6%), MDS (8.9%), DLBCL (3.9%), and FL (2.8%). Benign cases accounted for ~13% of the entire dataset. As expected, TET2 (29.6%), ASXL1 (26.7%), and DNMT3A (20.5%) were the most frequently mutated genes in myeloid neoplasms. In the mature lymphoid neoplasms, we observed high frequencies of TP53 (17%), CREBBP (12.7%), MYD88 (10.9%), and BCL2 (5.2%). Analysis of the benign samples demonstrated known biological patterns, particularly a significant correlation between patient age and CHIP-associated mutation burden (R=0.42, P<0.05). Furthermore, TCL cases exhibited a significantly higher frequency of positive TCR clonality results than mature B-cell neoplasms (80.5% vs. 16.5%; P < 0.001).
The ML-based pipeline accurately classified hematological malignancies, with the best-performing model achieving an overall AUC of 0.89 and yielding high individual AUCs (e.g., AML: 0.80; MPN: 0.91; CLL/SLL: 0.89; DLBCL: 0.78). Feature importance analyses showed that all feature types contributed significantly, with mutations and CNAs demonstrating the greatest impact. The Heme-xAIGen framework slightly improved discrimination (AUC = 0.90) while generating biologically coherent rules, for example, the presence of a CCND1::IGH fusion and absence of myeloid driver mutations (FLT3 and NPM1) for MCL, or the presence of FLT3 and NPM1 mutations and absence of lymphoid drivers for AML.
Remarkably, longitudinal application of Heme-xAIGen in patients with antecedent myeloid conditions assigned higher AML risk scores to patients with subsequent progression to AML ( P < 0.05).Conclusions: We introduce Heme-xAIGen, an explainable AI framework that accurately classifies hematologic malignancies by integrating genomic profiles, clonal dominance through VAF ranking, and immune repertoire data. Our approach demonstrates substantial predictive accuracy and delivers transparent diagnostic rationales. By effectively distinguishing complex lymphoid and myeloid malignancies and identifying disease progression, our model addresses critical diagnostic gaps, supports clinical decision-making, and facilitates patient monitoring.