Introduction The 2022 WHO/ICC classifications of myeloid neoplasms (MNs) incorporated genomics into the existing clinicopathologic framework, defining new (mono)genetic entities. However, this iterative approach resulted in a mix of pathologically and genomically-defined entities, reliant on arbitrary thresholds and historically-derived boundaries between entities. We hypothesised that integrating genomics with other features, de novo from first principles, might better incorporate modern molecular knowledge to define the biological continuum, evolutionary links and overlapping features across MN.

Methods The discovery cohort comprised 6484 thoroughly annotated pts. Contemporary diagnoses included 1817 AML, 1695 MDS, 737 MPN, and 1101 MDS/MPN. Non-malignant pts (AA/CHIP/CCUS/ICUS) were included to explore beyond current neoplastic boundaries. 1407 pts had ≥1 follow-up sample/s. Validation was performed in an independent international cohort (n=2012). An unsupervised random forest algorithm was used to integrate status (and VAF) for 28 mutations, karyotypes, CBC features, sex and age (total parameters 144). Features were selected for potential to be independent of BM sampling. Proximity-based multidimensional scaling/PCA were used to visualize latent structure and identify variance-driving features.

Results The model projected pts onto a continuous 3-dimensional space. We first applied clustering seeking 1st-order structure, revealing 4 major clusters (C1–C4). These were further subdivided into 58 distinct subclusters (SCs) by an unsupervised iterative clustering approach.

All SCs were enriched for distinctive features, with diverse associations linking genomics to clinicopathological/outcome features. Notably, contemporary diagnoses were distributed widely throughout this landscape, with 9/58 SCs containing cases spanning multiple WHO/ICC MN categories.

C1 displayed the most internal heterogeneity, with admixed AML, MDS and MDS/MPN pts. It naturally subdivided down to 4th-order subclustering. Ten SCs were dominated by splicing factor mutations. For example, SF3B1MT segregated into 3 SCs, alone or with high-VAF TET2/ASXL1 or DNMT3A mutations. A separate TET2/SRSF2 SC associated with elevated monocyte%, reflecting a discrete CMML-phenotype genomic entity. NPM1MT formed a distinct SC with FLT3-ITD, distinguished further by VAFs of FLT3, TET2, and NRAS. Current genomically-defined AML subtypes separated in the 4th subclustering.

C2 encompassed classical MPN driver mutations across 6 SCs, including a distinct CALRMT group and others defined by variable JAK2 VAFs, co-mutation patterns and other CBC features: eg high-VAF SF3B1MT correlated with PB blast%, and concurrent TET2MT with higher Hb/monocyte counts.

C3 was characterized by TP53 alterations, with 5 SCs differing by TP53MT burden, 17p lesions, karyotypic complexity and age. Two SCs, one marked by high TP53 VAF+complex karyotype, the other by ch17 lesions, were associated with the worst OS in the whole cohort.

C4 contained most known non-malignant pts, dividing into 4 SCs: 2 largely comprising CCUS and AA, while 2 reflect overlaps/convergence between AA and MDS and MPN, respectively.

Repurposing single-cell pseudotime analysis suggested multiple disease origins, with various progression trajectories converging on C3. Projecting them onto the model longitudinal samples moved along these derived paths, consistent with their known progression/remission changes, validating the model's biological relevance and predictive utility.

Remarkably, the 58 SCs displayed distinct risks of clonal progression and survival. To maximize potential clinical utility we created a pan-MN prognostic score integrating these features. It performed robustly (c-index 0.84), matching/exceeding existing prognostic tools and validated in an independent dataset.

Conclusion Integrating clinicopathologic and genomic data we developed ab initio a continuum model of MNs that transcends historical diagnostic boundaries, captures disease evolution and stratifies clinical outcomes. It may better reflect the core biology and natural histories of specific cases, placing more rational boundaries between discrete entities. Potential advantages extend to treatment selection/trial design (in the era of molecularly-targeted therapies) and personalised prognostication throughout each disease course. Expanded validation is needed to resolve complexity in some areas (notably in C2/C4) and validate/refine this prototype framework.

This content is only available as a PDF.
Sign in via your Institution