Key Points
Germline RUNX1, GATA2, and DDX41 HHMs are associated with driver somatic variants during leukemogenesis which are unique for each syndrome.
Ongoing molecular monitoring of germline carriers without HM is needed to assess the risk profile and clinical actionability of somatic markers.
Abstract
Individuals with germ line variants associated with hereditary hematopoietic malignancies (HHMs) have a highly variable risk for leukemogenesis. Gaps in our understanding of premalignant states in HHMs have hampered efforts to design effective clinical surveillance programs, provide personalized preemptive treatments, and inform appropriate counseling for patients. We used the largest known comparative international cohort of germline RUNX1, GATA2, or DDX41 variant carriers without and with hematopoietic malignancies (HMs) to identify patterns of genetic drivers that are unique to each HHM syndrome before and after leukemogenesis. These patterns included striking heterogeneity in rates of early-onset clonal hematopoiesis (CH), with a high prevalence of CH in RUNX1 and GATA2 variant carriers who did not have malignancies (carriers-without HM). We observed a paucity of CH in DDX41 carriers-without HM. In RUNX1 carriers-without HM with CH, we detected variants in TET2, PHF6, and, most frequently, BCOR. These genes were recurrently mutated in RUNX1-driven malignancies, suggesting CH is a direct precursor to malignancy in RUNX1-driven HHMs. Leukemogenesis in RUNX1 and DDX41 carriers was often driven by second hits in RUNX1 and DDX41, respectively. This study may inform the development of HHM-specific clinical trials and gene-specific approaches to clinical monitoring. For example, trials investigating the potential benefits of monitoring DDX41 carriers-without HM for low-frequency second hits in DDX41 may now be beneficial. Similarly, trials monitoring carriers-without HM with RUNX1 germ line variants for the acquisition of somatic variants in BCOR, PHF6, and TET2 and second hits in RUNX1 are warranted.
Introduction
Hereditary hematopoietic malignancies (HHMs) are hematologic syndromes characterized by Mendelian inheritance patterns and an increased lifetime risk for hematopoietic malignancies (HMs).1,2 Individuals with HHM-associated germ line variants have a highly variable risk for leukemogenesis, and many HHM-variant carriers do not develop malignancies (carriers-without HM).3 Very little is understood about the premalignant states in carriers-without HM, the molecular and genetic factors that affect leukemogenic risk, or the environmental factors that drive leukemogenesis in HHMs. This knowledge gap has hampered efforts to refine the clinical surveillance of carriers-without HM, identify individuals with the highest risk for HMs, and develop interventions that delay or prevent leukemogenesis in high-risk carriers-without HM. Moreover, treatments used for malignancies in HHM-variant carriers (carriers-with HM) are not tailored to these syndromes aside from DDX41 and GATA2 carriers, for which there is a limited role for lenalidomide therapy or prophylactic hematopoietic stem cell transplant, respectively.4-6 Instead, carriers-with HM are treated with standard-of-care therapies for sporadic HMs, which may carry an uncharacterized gene mutation–specific risk of additional treatment effects, such as engraftment failure or secondary therapy-related neoplasms. Given the paucity of HHM families at individual institutions, a coordinated, multi-institutional effort is required to understand the natural history of HHMs, leukemogenic mechanisms, and the unique biologic factors that may be present in individual HHM syndromes.
HHMs have been recognized phenotypically for over 100 years. Autosomal dominant (AD) predisposition to myeloid malignancies is the most well characterized, with more than 15 AD HHM-related genes identified to date.7 Pathogenic germ line variants in RUNX1, GATA2, and DDX41 collectively represent the most common causes of AD HHMs and are primarily associated with myeloid malignancies. These HHMs are more common than previously recognized and may have highly penetrant leukemogenic phenotypes. Germ line DDX41 carriers account for ∼2% to 4% of all patients with seemingly sporadic HMs, and GATA2 carriers have a 90% lifetime risk of developing HMs. RUNX1-driven HHMs were the first known HHM syndrome and have a high penetrance for HM (∼44%).3,4,8-11 Identifying these syndromes can be challenging because of limited syndromic features, and recognition is often made based on a high-risk family history, an early-onset HM, or the identification of an HHM-associated variant on tumor-based molecular profiling.12 Individuals harboring germ line variants in these genes often present with cytopenias: RUNX1 most commonly with thrombocytopenia;13,GATA2 with monocytopenia, dendritic cell, B, and natural killer cell (NK) lymphoid deficiency;14 and DDX41 with variable cytopenias that can include leukopenia, neutropenia, and/or erythroid dysplasia.15,16 The age of myelodysplastic syndrome/acute myeloid leukemia (MDS/AML) diagnosis also differs between HHMs, with GATA2 carriers developing MDS/AML at a mean age of 19 years, RUNX1 carriers at 29 years, and DDX41 carriers at 67 years.16,17
The mechanisms driving leukemogenesis in these variant carriers are unclear. Most work to date has focused on germ line RUNX1 variant carriers.8,RUNX1 carriers have an increased risk for clonal hematopoiesis (CH) (67%-75% CH18,19). However, because of the rarity of RUNX1 HHM, single-center studies have limited numbers of patients available (919 and 318). Recent studies looking at CH in germ line GATA2 carriers have shown an association between CH and a hypocellular marrow while also linking specific somatic events with the likelihood of leukemic transformation.20,21 Similarly assessment of CH patterns in comparative cohorts of HHM-variant carriers may identify specific leukemogenic patterns for different HHMs and ultimately inform clinical trials to define guidelines for clinical surveillance of unaffected HHM-variant carriers.
To address this knowledge gap, we collected retrospective next-generation sequencing (NGS) data from hematopoietic tissue samples from an international cohort of patients with HHM driven by germ line RUNX1, GATA2, or DDX41 variants. Our cohort is the largest comparative HHM-focused cross-sectional collection of its kind, with 240 patient samples evenly distributed between carriers-without HM (n = 120) and carriers-with HM (n = 120). We used a uniform variant calling and curation approach to identify driver somatic variants in each sample. This unique distribution of samples from carriers-without HM and carriers-with HM and across multiple HHMs, in conjunction with a uniform bioinformatic approach, enabled us to determine driver somatic variants that develop within hematopoietic tissue in RUNX1, GATA2, and DDX41 variant carriers before and after diagnosis of a blood cancer in the HHM syndromes.
Patients and methods
Patient cohort
Clinical and genomics data from germ line RUNX1, GATA2, or DDX41 variant carriers were collected from the RUNX1 database (https://runx1db.runx1-fpd.org/),22 the Centre for Cancer Biology (Australia), the University of Chicago (USA), and the National Institutes of Health (USA). In total, data from 195 patients who had undergone genomics profiling (whole-exome sequencing or panel-based sequencing) were retrospectively collated to form the RUNX1, GATA2, and DDX41 cohorts. All procedures in this study involving human participants were performed in accordance with the Declaration of Helsinki. Studies were approved by institutional human research ethics committees and/or institutional research boards. All participants signed an informed consent form to share genomics and protected health information.
NGS reanalysis and variant calling pipeline
NGS data were collected and reanalyzed with the bioinformatics pipeline used for the RUNX1 database.22 Original FASTQ (textfile format for sequencing data) or Binary Alignment Map (BAM) files were obtained. Sequence reads were aligned to the GRCh37 (hs37d5) human reference genome with BWA-MEM (ver 0.7.12).23 Sambamba (ver 0.6.5)24 was used for marking polymerase chain reaction duplicates, and GATK (ver 3.8-1) was used to recalibrate base-quality scores. Freebayes (ver 1.2)25 was used to call single nucleotide variants (SNVs) and insertions/deletions (indels). Variant-, gene-, and protein-level annotations were performed using an in-house pipeline (https://github.com/SACGF/variantgrid). Somatic variant curation was performed as previously described22 (supplemental Methods). All data sets were independently curated by at least 3 variant curators.
Tumor mutation burden (TMB) analysis
SNVs and indels were identified with Seurat, Shimmer, Strelka, and SomaticSniper,26-29 using paired germ line samples (cultured skin fibroblasts or hair) from the same patient to remove germ line variants. Somatic variants identified in 3 or more callers were included with high confidence. Variant calling thresholds were set at alternate allelic depth ≥3 and variant allele frequency (VAF) ≥ 5%. Somatic variants were filtered and annotated with the variant effect predictor package (hg19). The total number of somatic variants in the tumor exome was divided by the length of exome capture (38 Mb) to calculate the TMB.
Statistical analysis
GraphPad Prism 7.03 and RStudio Version 1.4.17 with tidyverse, ggplot2, ggrepel, caTools, and ROCR packages were used for statistical calculations and figures. ProteinPaint was used to create lollipop plots.30 Circos plots were created using ShinyCircos software.31 Unless otherwise stated, the P value was calculated using a one-way analysis of variance with Tukey multiple comparisons test using a single pooled variance. P value of sex differences were calculated using a two-sided Fisher exact test. The prop.test function/z-value was used as a 2-sample test for equality of proportions with continuity correction. Logistic regression modeling was used to determine the relationship between age and CH. The nonparametric Mann-Whitney U test was used to calculate the significance of the TMB between the RUNX1 and DDX41 cohorts.
Results
Genomic cohorts for germ line RUNX1, GATA2, or DDX41 HHMs
Through international data sharing, we created cohorts of carriers-without HM (no HM) and carriers-with HM (diagnosed with an HM) with germ line RUNX1, GATA2, or DDX41 variants (Figure 1; supplemental Methods). NGS data included samples from germ line controls, complete remission patients, carriers-without HM, and carriers-with HM. Multiple samples were collected from individuals when available, including longitudinal. The RUNX1 cohort included 66 carriers-without HM and 52 carriers-with HM individuals (including 80 and 66 independent NGS samples, respectively). The GATA2 cohort included 9 carriers-without HM and 13 carriers-with HM individuals (9 and 13 NGS samples, respectively). The DDX41 cohort included 22 carriers-without HM and 29 carriers-without HM individuals (including 31 and 41 independent NGS samples, respectively). Each cohort is summarized in supplemental Table 1 and supplemental Figures 1A and 2. We used a standardized bioinformatics and variant curation approach22 to identify clinically relevant and potentially clinically relevant somatic variants (driver somatic variants, detailed in the supplemental Methods).
CH is prevalent in RUNX1 and GATA2 but not DDX41 HHM carriers-without HM
Age-related CH is frequently observed in healthy populations, with the prevalence of CH in HHMs an area of active investigation.18-21,32-34 We evaluated our cross-sectional cohorts of RUNX1, GATA2, and DDX41 carriers-without HM for CH-related variants at the time of sample collection (Table 1; Figure 2A). We identified CH in 35% of RUNX1 carriers-without HM (23 of 66 individuals; Figure 2A,C) and 22% (2 of 9 individuals, Figure 2A; supplemental Figure 1B) of GATA2 carriers-without HM, respectively. The prevalence of CH was significantly lower (3%, 1 of 31 individuals, P = .002; Figure 2A; supplemental Figure 1C) in DDX41 carriers-without HM. The reduced prevalence of CH in the DDX41 cohort was independent of age, as the age distribution of samples was overlapping between HHM cohorts (supplemental Figure 1A). For germ line RUNX1, CH was identified in all age groups, and the prevalence of CH significantly increased with age (Figure 2C, P=.0267, logistic regression). In the RUNX1 cohort, 5 of 6 (83%) individuals 60 years of age and older had at least 1 CH variant (Figure 2C). The number of variants increased with age, as 92% of individuals under the age of 50 years with CH had only 1 CH variant, whereas 71% of patients over the age of 50 years had 2 or more CH variants (P = .001, Figure 2C,E). The median VAF of CH variants did not change significantly with age (Figure 2D). For all cohorts, no CH was identified in any individual younger than 16 years (n = 9).
Individual . | Phenotype . | Gene . | Somatic variant . | |||||
---|---|---|---|---|---|---|---|---|
. | Clinical presentation . | Age . | Sex . | . | Genomic coordinates . | c_HGVS . | p_HGVS . | VAF (%) . |
M09_1 | Thrombocytopenia | 16 | F | BCOR | X:39933270 CA>C | NM_001123385.2:c.1328del | p.Leu443CysfsTer8 | 8.3 |
M01_3 | Thrombocytopenia | 17 | M | ATP10A | 15:25953381 G>A | NM_024490.3:c.2411C>T | p.Ala804Val | 5.0 |
M03_2 | Thrombocytopenia | 17 | M | PHF6 | X:133527608 CTG>C | NM_032458.3:c.321_322del | p.Ala108IlefsTer3 | 39.4 |
CUX1 | 7:101882720 G>A | NM_001202543.2:c.3776G>A | p.Arg1259Gln | 42.3 | ||||
IDH2 | 15:90631934 C>T | NM_002168.3:c.419G>A | p.Arg140Gln | 9.7 | ||||
G01_2 | Thrombocytopenia | 19 | F | NOTCH3 | 19:15276860 GC>G | NM_000435.3:c.5404del | p.Ala1802LeufsTer23 | 3.2 |
V01_1 | Thrombocytopenia | 23 | M | EP300 | 22:41574723 TCCACACCACGTTTCC>T | NM_001429.3:c.7014_7028del15 | p.His2338_Pro2342del | 29.6 |
G07_3 | Thrombocytopenia | 33 | M | BCOR | X:39932334 G>GT | NM_001123385.2:c.2264dup | p.Tyr755Ter | 2.4 |
M06_2 | Thrombocytopenia | 37 | F | TET2 | 4:106164068 G>A | NM_001127208.3):c.3578G>A | p.Cys1193Tyr | 11.3 |
G07_2 | Thrombocytopenia | 40 | M | TET2 | 4:106164787 C>T | NM_001127208.3:c.3655C>T | p.His1219Tyr | 5.3 |
S_G_3 | Thrombocytopenia | 40 | F | TET2 | 4:106196461 T>G | NM_001127208.3:c.4794T>G | p.Tyr1598Ter | 5.0 |
W01_3 | Thrombocytopenia | 40 | M | BCOR | X:39921617 CG>C | NM_001123385.2:c.4202del | p.Pro1401ArgfsTer83 | 20.8 |
S_D_2 | Thrombocytopenia | 43 | M | DNMT3A | 2:25463242 AG>A | NM_022552.5:c.2250del | p.Phe751SerfsTer28 | 3.1 |
48 | M | DNMT3A | 2:25463242 AG>A | NM_022552.5:c.2250del | p.Phe751SerfsTer28 | 4.9 | ||
U02_1 | Thrombocytopenia | 49 | M | TET2 | 4:106162587 G>A | NM_001127208.3:c.3500+1G>A | p.? | 2.6 |
A01_1 | Thrombocytopenia | 53 | F | TET2 | 4:106158510 T>C | NM_001127208.3:c.3409+2T>C | p.? | 33.3 |
DNMT3A | 2:25497943 CG>C | NM_022552.5:c.505del | p.Arg169GlyfsTer56 | 29.6 | ||||
SRSF2 | 17:74732959 G>A | NM_003016.4:c.284C>T | p.Pro95Leu | 22.9 | ||||
56 | F | DNMT3A | 2:25497943 CG>C | NM_022552.5:c.505del | p.Arg169GlyfsTer56 | 42.8 | ||
TET2 | 4:106158510 T>C | NM_001127208.3:c.3409+2T>C | p.? | 30.5 | ||||
SRSF2 | 17:74732959 G>A | NM_003016.4:c.284C>T | p.Pro95Leu | 28.0 | ||||
G02_2 | Thrombocytopenia | 55 | M | BCOR | X:39932898 T>TG | NM_001123385.2:c.1700dup | p.Ala568SerfsTer43 | 6.7 |
BCOR | X:39923055 C>T | NM_001123385.2:c.3653G>A | p.Trp1218Ter | 4.3 | ||||
BCOR | X:39911577 GA>G | NM_001123385.2:c.5052del | p.Pro1685GlnfsTer40 | 2.5 | ||||
54 | M | BCOR | X:39932898 T>TG | NM_001123385.2:c.1700dup | p.Ala568SerfsTer43 | 6.4 | ||
BCOR | X:39923055 C>T | NM_001123385.2:c.3653G>A | p.Trp1218Ter | 2.2 | ||||
BCOR | X:39933676 TG>T | NM_001123385.2:c.922del | p.Gln308ArgfsTer70 | 1.2 | ||||
BCOR | X:39933416 TG>T | NM_001123385.2:c.1182del | p.Lys395ArgfsTer47 | 0.7 | ||||
X01_1 | Asymptomatic | 60 | F | BCOR | X:39932109 ACT>A | NM_001123385.2:c.2488_2489del | p.Ser830CysfsTer6 | 17.6 |
W01_2 | Thrombocytopenia | 68 | M | BCOR | X:39933593 A>AG | NM_001123385.2:c.1005dup | p.Ser336LeufsTer45 | 7.1 |
I03_2 (BM) | Thrombocytopenia | 72 | F | DNMT3A | 2:25470029 T>C | NM_022552.5:c.1015-2A>G | p.? | 13.3 |
BCOR | X:39933492 TG>T | NM_001123385.2:c.1106del | p.Ser369Ter | 4.0 | ||||
F01_6 | Thrombocytopenia | 76 | M | BCOR | X:39916476 C>T | NM_001123385.2:c.4527G>A | p.Trp1509Ter | 13.6 |
ATM | 11:108213949 G>A | NM_000051.3:c.8269G>A | p.Val2757Met | 17.0 | ||||
GRIN2A | 16:9857831 G>C | NM_000833.4:c.3570C>G | p.His1190Gln | 14.5 | ||||
F01_8 | Thrombocytopenia | 76 | M | BCOR | X:39921490 TG>T | NM_001123385.2:c.4329del | p.Thr1444ProfsTer40 | 34.7 |
TP53 | 17:7578442 T>C | NM_000546.6:c.488A>G | p.Tyr163Cys | 4.1 | ||||
CCND3 | 6:41903707 G>A | NM_001760.4:c.850C>T | p.Pro284Ser | 3.5 | ||||
U02_3 | Asymptomatic | NA | M | TET2 | 4:106162587 G>A | NM_001127208.3:c.3500+1G>A | p.? | 1.2 |
S_E_4 | Thrombocytopenia | NA | F | SRSF2 | 17:74732959 G>T | NM_003016.4:c.284C>A | p.Pro95His | 13.9 |
ATR | 3:142281940 A>G | NM_001184.4:c.304T>C | p.Trp102Arg | 3.9 | ||||
D01_2 | Thrombocytopenia | NA | M | DNMT3A | 2:25466797 C>T | NM_022552.5:c.1906G>A | p.Val636Met | 24.7 |
BCOR | X:39923092 TA>T | NM_001123385.2:c.3615del | p.Lys1207AsnfsTer31 | 3.6 | ||||
DNMT3A | 2:25464460 C>T | NM_022552.5:c.2053G>A | p.Gly685Arg | 3.2 | ||||
BCOR | X:39933373 TGCCCGG>TT | NM_001123385.2:c.1220_1225delCCGGGCinsA | p.Pro407GlnfsTer31 | 3.1 | ||||
D02_2 | Thrombocytopenia | NA | F | PTPN11 | 12:112926887 G>A | NM_002834.4:c.1507G>A | p.Gly503Arg | 25.2 |
U02_4 | NA | NA | F | TET2 | 4:106157215 C>T | NM_001127208.3:c.2116C>T | p.Gln706Ter | 7.3 |
Family_53_8 | Asymptomatic | 16.5 | F | KDM5A | 12:416952 C>CT | NM_001042603.3:c.3597dup | p.Gly1200ArgfsTer7 | 3.2 |
Family_53_3 | Asymptomatic | 47 | M | DNMT3A | 2:25457171 T>A | NM_022552.5:c.2716A>T | p.Lys906Ter | 3.6 |
Family_0127.041 | Asymptomatic | 87 | F | ASXL1 | 20:31022576 TAC>T | NM_015338.5:c.2062_2063del | p.Thr688fs29 | 4.2 |
DNMT3A | 2:25467022 A>G | NM_022552.5:c.1851+2T>C | p.? | 4.1 |
Individual . | Phenotype . | Gene . | Somatic variant . | |||||
---|---|---|---|---|---|---|---|---|
. | Clinical presentation . | Age . | Sex . | . | Genomic coordinates . | c_HGVS . | p_HGVS . | VAF (%) . |
M09_1 | Thrombocytopenia | 16 | F | BCOR | X:39933270 CA>C | NM_001123385.2:c.1328del | p.Leu443CysfsTer8 | 8.3 |
M01_3 | Thrombocytopenia | 17 | M | ATP10A | 15:25953381 G>A | NM_024490.3:c.2411C>T | p.Ala804Val | 5.0 |
M03_2 | Thrombocytopenia | 17 | M | PHF6 | X:133527608 CTG>C | NM_032458.3:c.321_322del | p.Ala108IlefsTer3 | 39.4 |
CUX1 | 7:101882720 G>A | NM_001202543.2:c.3776G>A | p.Arg1259Gln | 42.3 | ||||
IDH2 | 15:90631934 C>T | NM_002168.3:c.419G>A | p.Arg140Gln | 9.7 | ||||
G01_2 | Thrombocytopenia | 19 | F | NOTCH3 | 19:15276860 GC>G | NM_000435.3:c.5404del | p.Ala1802LeufsTer23 | 3.2 |
V01_1 | Thrombocytopenia | 23 | M | EP300 | 22:41574723 TCCACACCACGTTTCC>T | NM_001429.3:c.7014_7028del15 | p.His2338_Pro2342del | 29.6 |
G07_3 | Thrombocytopenia | 33 | M | BCOR | X:39932334 G>GT | NM_001123385.2:c.2264dup | p.Tyr755Ter | 2.4 |
M06_2 | Thrombocytopenia | 37 | F | TET2 | 4:106164068 G>A | NM_001127208.3):c.3578G>A | p.Cys1193Tyr | 11.3 |
G07_2 | Thrombocytopenia | 40 | M | TET2 | 4:106164787 C>T | NM_001127208.3:c.3655C>T | p.His1219Tyr | 5.3 |
S_G_3 | Thrombocytopenia | 40 | F | TET2 | 4:106196461 T>G | NM_001127208.3:c.4794T>G | p.Tyr1598Ter | 5.0 |
W01_3 | Thrombocytopenia | 40 | M | BCOR | X:39921617 CG>C | NM_001123385.2:c.4202del | p.Pro1401ArgfsTer83 | 20.8 |
S_D_2 | Thrombocytopenia | 43 | M | DNMT3A | 2:25463242 AG>A | NM_022552.5:c.2250del | p.Phe751SerfsTer28 | 3.1 |
48 | M | DNMT3A | 2:25463242 AG>A | NM_022552.5:c.2250del | p.Phe751SerfsTer28 | 4.9 | ||
U02_1 | Thrombocytopenia | 49 | M | TET2 | 4:106162587 G>A | NM_001127208.3:c.3500+1G>A | p.? | 2.6 |
A01_1 | Thrombocytopenia | 53 | F | TET2 | 4:106158510 T>C | NM_001127208.3:c.3409+2T>C | p.? | 33.3 |
DNMT3A | 2:25497943 CG>C | NM_022552.5:c.505del | p.Arg169GlyfsTer56 | 29.6 | ||||
SRSF2 | 17:74732959 G>A | NM_003016.4:c.284C>T | p.Pro95Leu | 22.9 | ||||
56 | F | DNMT3A | 2:25497943 CG>C | NM_022552.5:c.505del | p.Arg169GlyfsTer56 | 42.8 | ||
TET2 | 4:106158510 T>C | NM_001127208.3:c.3409+2T>C | p.? | 30.5 | ||||
SRSF2 | 17:74732959 G>A | NM_003016.4:c.284C>T | p.Pro95Leu | 28.0 | ||||
G02_2 | Thrombocytopenia | 55 | M | BCOR | X:39932898 T>TG | NM_001123385.2:c.1700dup | p.Ala568SerfsTer43 | 6.7 |
BCOR | X:39923055 C>T | NM_001123385.2:c.3653G>A | p.Trp1218Ter | 4.3 | ||||
BCOR | X:39911577 GA>G | NM_001123385.2:c.5052del | p.Pro1685GlnfsTer40 | 2.5 | ||||
54 | M | BCOR | X:39932898 T>TG | NM_001123385.2:c.1700dup | p.Ala568SerfsTer43 | 6.4 | ||
BCOR | X:39923055 C>T | NM_001123385.2:c.3653G>A | p.Trp1218Ter | 2.2 | ||||
BCOR | X:39933676 TG>T | NM_001123385.2:c.922del | p.Gln308ArgfsTer70 | 1.2 | ||||
BCOR | X:39933416 TG>T | NM_001123385.2:c.1182del | p.Lys395ArgfsTer47 | 0.7 | ||||
X01_1 | Asymptomatic | 60 | F | BCOR | X:39932109 ACT>A | NM_001123385.2:c.2488_2489del | p.Ser830CysfsTer6 | 17.6 |
W01_2 | Thrombocytopenia | 68 | M | BCOR | X:39933593 A>AG | NM_001123385.2:c.1005dup | p.Ser336LeufsTer45 | 7.1 |
I03_2 (BM) | Thrombocytopenia | 72 | F | DNMT3A | 2:25470029 T>C | NM_022552.5:c.1015-2A>G | p.? | 13.3 |
BCOR | X:39933492 TG>T | NM_001123385.2:c.1106del | p.Ser369Ter | 4.0 | ||||
F01_6 | Thrombocytopenia | 76 | M | BCOR | X:39916476 C>T | NM_001123385.2:c.4527G>A | p.Trp1509Ter | 13.6 |
ATM | 11:108213949 G>A | NM_000051.3:c.8269G>A | p.Val2757Met | 17.0 | ||||
GRIN2A | 16:9857831 G>C | NM_000833.4:c.3570C>G | p.His1190Gln | 14.5 | ||||
F01_8 | Thrombocytopenia | 76 | M | BCOR | X:39921490 TG>T | NM_001123385.2:c.4329del | p.Thr1444ProfsTer40 | 34.7 |
TP53 | 17:7578442 T>C | NM_000546.6:c.488A>G | p.Tyr163Cys | 4.1 | ||||
CCND3 | 6:41903707 G>A | NM_001760.4:c.850C>T | p.Pro284Ser | 3.5 | ||||
U02_3 | Asymptomatic | NA | M | TET2 | 4:106162587 G>A | NM_001127208.3:c.3500+1G>A | p.? | 1.2 |
S_E_4 | Thrombocytopenia | NA | F | SRSF2 | 17:74732959 G>T | NM_003016.4:c.284C>A | p.Pro95His | 13.9 |
ATR | 3:142281940 A>G | NM_001184.4:c.304T>C | p.Trp102Arg | 3.9 | ||||
D01_2 | Thrombocytopenia | NA | M | DNMT3A | 2:25466797 C>T | NM_022552.5:c.1906G>A | p.Val636Met | 24.7 |
BCOR | X:39923092 TA>T | NM_001123385.2:c.3615del | p.Lys1207AsnfsTer31 | 3.6 | ||||
DNMT3A | 2:25464460 C>T | NM_022552.5:c.2053G>A | p.Gly685Arg | 3.2 | ||||
BCOR | X:39933373 TGCCCGG>TT | NM_001123385.2:c.1220_1225delCCGGGCinsA | p.Pro407GlnfsTer31 | 3.1 | ||||
D02_2 | Thrombocytopenia | NA | F | PTPN11 | 12:112926887 G>A | NM_002834.4:c.1507G>A | p.Gly503Arg | 25.2 |
U02_4 | NA | NA | F | TET2 | 4:106157215 C>T | NM_001127208.3:c.2116C>T | p.Gln706Ter | 7.3 |
Family_53_8 | Asymptomatic | 16.5 | F | KDM5A | 12:416952 C>CT | NM_001042603.3:c.3597dup | p.Gly1200ArgfsTer7 | 3.2 |
Family_53_3 | Asymptomatic | 47 | M | DNMT3A | 2:25457171 T>A | NM_022552.5:c.2716A>T | p.Lys906Ter | 3.6 |
Family_0127.041 | Asymptomatic | 87 | F | ASXL1 | 20:31022576 TAC>T | NM_015338.5:c.2062_2063del | p.Thr688fs29 | 4.2 |
DNMT3A | 2:25467022 A>G | NM_022552.5:c.1851+2T>C | p.? | 4.1 |
CH is increased in RUNX1 carriers-without HM relative to population controls
We then compared the prevalence of CH in our cohort of RUNX1 carriers-without HM to population controls from Jaiswal et al and Genovese et al (n = 27 783).32,33 The prevalence of CH was higher in RUNX1 carriers-without HM in every age group (Figure 3A, Z test of proportions, P < .0001). The prevalence of CH was 0.2% in controls between the ages of 19 and 29 years but was 22.2% in RUNX1 carriers-without HM in the same age group. In individuals aged 60 years or older, CH was detectable in 7% of controls and 83% of RUNX1 carriers-without HM, demonstrating that RUNX1 carriers-without HM have an increased prevalence of CH at all ages compared with population controls. We investigated the frequency of variants in prototypical CH genes. Variants in the epigenetic regulators DNMT3A (54%), TET2 (29%), and ASXL1 (8%) are the most frequent CH-related genes in the general population.32,35,36 Surprisingly, the most frequently mutated CH-related gene in RUNX1 carriers-without HM was BCOR (42%), which is mutated in only 0.6% of population controls with CH (Figure 3B, P < .0001).32,35,36 DNMT3A was mutated in 17% (P < .0001), TET2 in 14% (P = .2182), and ASXL1 was not mutated in RUNX1 carriers-without HM with CH (Figure 3B). These findings demonstrate that the mechanism of CH in RUNX1 carriers-without HM is distinct for this syndrome as compared with population controls.
Clonal structure and evolution in RUNX1 carriers-without HM
Further examining the clonal composition of somatic variants in carriers, we postulated the order of mutation acquisition in samples with multiple variants by using relative VAFs (Figure 4A). We observed that BCOR, as well as being the most frequently mutated gene, was also present across the entire age spectrum, from carriers as young as 16 years to 76 years, found as a first hit (Figure 4A; Table 1). Consistent with the overall data, there was a general increase in BCOR VAF with age and additional mutations, which could be both additional BCOR mutations as well as mutations in other genes including TP53 and ATM, with 1 case where DNMT3A was antecedent to BCOR variants (Figure 4A; Table 1). Three RUNX1 carriers-without HM had longitudinal peripheral blood samples available, which allowed us to track the temporal evolution of CH (Figure 4B). Case 1: a male with thrombocytopenia and a germ line RUNX1 p.R169I variant had a somatic DNMT3A p.F751fs variant detected at a VAF of 3.1% at 43 years of age. The clone increased to a VAF of 5.0% over 5 years without any clinical-level changes, including leukemogenesis, or the development of additional clones. Case 2: a female with thrombocytopenia and a germ line RUNX1 p.R320∗ variant who developed a TET2 p.Y1598∗ somatic variant that persisted for more than 7 years, increasing from a VAF of <1% to 5%, without clinical changes. Case 3: a female with a germ line RUNX1 c.351+1G>A splicing variant and thrombocytopenia who developed AML 3 years later. We identified 3 somatic variants (DNMT3A, SRSF2, and TET2) in the patient’s initial sample, collected at age 53 years. These variants persisted for 2 years with persistent thrombocytopenia but no leukemogenesis. The patient then developed AML with additional somatic RUNX1 and STAG2 variants at 56 years of age. The initial DNMT3A and SRSF2 CH-related variants remained stable, whereas the TET2 variant was outcompeted during leukemogenesis.
Somatic variants in germ line RUNX1, GATA2, and DDX41 malignancy samples
We next sought to define the landscape of driver somatic variants in our carriers-with HM cohorts who had developed malignancies. In the RUNX1 carriers-with HM cohort, at least 1 driver somatic variant was detected in 46 of 52 (88%) individuals diagnosed with an HM. No association between the number of driver somatic variants and the histologic subtype of malignancy was observed (supplemental Figure 2C). Driver somatic variants were identified in 64 unique genes, and 22 genes were mutated in more than 1 individual (Figure 5A; supplemental Figure 3A). Second hits in RUNX1 were the most frequent somatic mutations, with variants detected in 18 individuals (41% of patients with complete sequencing coverage of RUNX1 [supplemental Methods]). Three types of somatic RUNX1 variants were identified: small indels and SNVs (unique from the germ line variant, 72%), copy neutral loss of heterozygosity variants (17%), and trisomy 21 (somatic amplification of the germ line RUNX1 variant, 11%). Somatic second hits in RUNX1 included 12 missense variants in the exons coding for the RUNT domain as well as a splice-site variant (c.507_508+1dupAGG, Figure 6A-B). Cytogenetic analyses identified 2 individuals with +21 (VAF > 60%) and 3 individuals with a mutant VAF >80% (copy neutral loss of heterozygosity variants) (supplemental Figure 3A). We did not identify associations between individual germ line and driver somatic variant pairs. Most individuals (78%) with a somatic RUNX1 variant were female (P = .02, Figure 6C). Somatic RUNX1 variants were identified in all age groups, and no association was established between individual somatic RUNX1 variants and the age of HM diagnosis (Figure 6D). A female sex bias for HM was observed in all age groups (Figure 6D). Besides second hits in RUNX1, a series of established cancer genes were mutated in the HM cohort: PHF6 (21%), BCOR (20%), TET2 (13%), SH2B3 (11%), and SRSF2 (11%) (Figure 5A; supplemental Figure 3A). AML was the predominant malignancy in germ line RUNX1 variant carriers, with a sex bias for female AML diagnosis (23 of 29 females, 9 of 23 males, P = .004, supplemental Figure 4). Among individuals with somatic RUNX1 variants, 15 (83%) had AML, and 12 of 15 (80%) were females. These data from germ line RUNX1 variant carriers support a female sex bias for AML leukemogenesis driven by somatic RUNX1 variants.
No somatic second hits in GATA2 were detected in our cohort of 13 germ line GATA2 variant carriers (Figure 5B; supplemental Figure 3B). We detected at least 1 driver somatic variant in 69% (9 of 13) of germ line GATA2 variant carriers who had developed malignancies. Analysis of the GATA2 cohort was limited by low sample numbers (supplemental Figure 4A), but the lack of second hits in GATA2 suggests biallelic variants are not a common leukemogenic mechanism in germ line GATA2 variant carriers.37
In the DDX41 carriers-with HM cohort, we identified at least 1 driver somatic variant in 10 unique genes in 18 individuals (62%, Figure 5C; supplemental Figure 3C). Only 3 genes were mutated in more than 1 individual (DDX41, ASXL1, and JAK2 p.Val617Phe). The most frequent somatic event was a second hit in DDX41, which was observed in 62% (n = 18) of individuals with HM. Apart from a single splice-site variant (c.1621+1G>A), all somatic DDX41 variants were missense variants in the DEAD-box domain (3 of 18) or the recurrent p.R525H variant in the helicase C domain (14 of 18 DDX41 somatic variants, 78%, Figure 7A-B). We observed a significant sex bias for DDX41 malignancies (3:1 male:female, P = .0002), which correlated with males presenting with a somatic DDX41 variant (14 of 18 males, Figure 7C-D). No association between specific somatic variants and germ line DDX41 variants, age of malignancy diagnosis, or histologic subtype of malignancy was observed.
Mutational burden in germ line RUNX1, GATA2, or DDX41 malignancy samples
To better understand the somatic mutational burden in each syndrome, we evaluated the VAF of all driver somatic variants. A large distribution of VAFs was observed in RUNX1 carriers-with HM (median VAF = 22.4%, mean = 27.0%, mode = 5.7, 34%) and GATA2 carriers-with HM (median = 21.0%, mean = 20.6%, mode = 8, 27.2%). VAFs among RUNX1 carriers-with HM showed the largest distribution (Figure 5D). The DDX41 cohort harbored low VAF driver somatic variants (median VAF = 8.9%, mean 13.4%, Figure 5D). No association between age at malignancy diagnosis and the VAF of driver somatic variants was observed in any cohort (supplemental Figure 5). TMB was calculated for DDX41 (n = 14) and RUNX1 (n = 4) carriers-with HM with matched germ line/tumor samples. DDX41 had a lower TMB (0.75 mutations/Mb) than RUNX1 malignancies (3.3 mutations/Mb; P = .01, Figure 5E).
Discussion
The prevalence of HHMs is estimated to range from 7% to 14% in cohorts of patients with myeloid malignancies.38,39 Although the clinical recognition of these syndromes has improved since RUNX1-driven HHMs were first described,8 questions remain regarding the optimal approach to monitoring carriers-without HM and how malignancy-directed treatments may be individualized for affected patients. Currently it is challenging for clinicians to provide tailored risk-assessment to patients as the natural history of carriers-without HM is not well understood and there has been no approach to identify HHM individuals at highest risk for leukemogenesis. To address this gap, we have leveraged our HHM international collaborative network and assembled and characterized the most extensive cross-sectional comparative cohort of carriers-without HM and carriers-with HM germ line RUNX1, GATA2, or DDX41 variants (n = 191, 102 probands, Figure 1). We demonstrate RUNX1, GATA2, and DDX41 germ line variant carriers experience highly variable risk for CH and unique somatic drivers during CH relative to population controls. Each HHM is remarkable for mutational profiles during frank leukemogenesis that are also unique to each HHM syndrome.
The most significant risk factor for CH in the general population is aging, with ∼10% of individuals over the age of 70 years having detectable CH.32,33 Several studies investigating CH in the background of inherited bone marrow failure have shown an increased risk for CH.34,40,41 Interestingly, individuals without HM with HHM germ line variants have been shown to have variable risk for CH in a series of small studies (ANKRD26, ETV6, RUNX1).18,19,42,43 We have now performed the largest collective analysis of CH in RUNX1, GATA2, and DDX41 carriers without HM to date. This analysis extends studies of CH in the HHMs to novel phenotypes (DDX41) and suggests that HHM predisposition in GATA2, and RUNX1 carriers-without HM, may be driven by early-onset CH (22.2% in RUNX1 and 25% in GATA2). Recently, larger cohorts of patients with germ line GATA2 without HM,20,21 have also shown CH is common in patients without HM, with CH associated with a hypocellular marrow. Further investigation is required to determine if CH also correlates with cytopenias in germ line RUNX1 cohorts. In contrast, DDX41 carriers-without HM have a very low risk for CH at any age. RUNX1 patients (without HM) with CH also had unique somatic drivers relative to CH population controls, most notably a high prevalence of BCOR variants.32,33 This has similarity to aplastic anemia, where BCOR and BCORL1 are frequently mutated.44 BCOR variants alone did not appear sufficient to cause leukemogenesis in our cohort. This suggests additional co-operating variants are required for malignancy progression (including somatic RUNX1, TET2, DNMT3A, and BCORL1 variants [supplemental Figure 3A]). Some of these interactions are validated in in vivo models with conditional Bcor knockout mouse models combined with variants in Dnmt3a, Kras, or Tet2 sufficient to drive malignancy transformation.45-47 Notably, BCOR variants are frequent in the RUNX1 carriers-with HM cohort, supporting the notion that CH in this setting, is a risk factor for leukemic transformation. However further models are required to determine the functional effects of co-occurring BCOR and RUNX1 variants on hematopoetic stem and progenitor cell (HSPC) fitness and leukemic transformation.
The most frequent leukemogenic event in our cohort of DDX41 and RUNX1 carriers-with HM was biallelic somatic variants in DDX41 and RUNX1, respectively (supplemental Figure 6). In contrast, second hits in GATA2 germ line variant carriers with malignancies were not detected. In case study #3, for example, a germ line RUNX1 carrier initially presented with thrombocytopenia before progressing to AML. In this patient, leukemogenesis was associated with the acquisition of a biallelic RUNX1 variant, but at a late stage after the acquisition of TET2, DNMT3A, and SRSF2 variants. Given that we never detected somatic RUNX1 variants in our RUNX1 carriers-without HM cohort, in stark contrast to the high frequency of second-hit RUNX1 variants in our HM cohort, we suggest that somatic RUNX1 variants likely represent a later step that may be key to leukemogenic transformation. Interestingly, for DDX41, the lack of CH gene mutations in carriers was mirrored by a lack of CH gene mutations in malignancy (Figure 5C). This indicates that the molecular natural history of this disorder is quite different from both RUNX1 and GATA2 HHMs. Further longitudinal, lineage tracing, and single-cell sequencing studies are required to determine if these are initiating events in malignancy development and the timeline to disease progression.
Interestingly, both germ line RUNX1 and DDX41 cohorts presented with a sex bias for HM development, but this did not correlate with differences in X-linked somatic variants. Sex bias was not observed in our GATA2 cohort, as we have also observed previously.14,RUNX1 genomic alterations have a high correlation with hormone-related cancers, especially cancers common in female patients, and with estrogen known to play a role in hematopoiesis,48,49 we hypothesize that disruption of specific estrogen signaling pathways in germ line RUNX1 carriers could predispose females to AML.50-53 In the germ line RUNX1 malignancy cohort, recurrent somatic gene variants are involved in epigenetic regulation and epigenetic dysregulation and can occur in leukemogenesis, with sex-specific differences in methylation observed in hematopoietic tissue.50,54,55 The innate immune response is also known to be increased in females relative to males.56 For DDX41, given its role as an intracellular pattern recognition receptor that triggers the innate immune response,57 a dysregulated immune response could exaggerate existing differences in innate immunity between males and females, contributing to the observed sex bias in malignancy penetrance. Further investigation is warranted to understand the interplay of these mechanisms on tumorigenesis, which may ultimately inform the development of sex-specific therapies that optimize outcomes for patients with HHM.
Despite a lack of definitive guidelines, limitations, and ongoing debate, molecular monitoring in clinical practice is becoming more widespread.43 Findings from this study have implications for clinical surveillance and counseling for different patients with HHM. For example, in RUNX1 and GATA2 HHMs, regular targeted sequencing of CH genes, even in younger carriers-without HM, will provide a tool to monitor the evolution of the clonal burden associated with these variants. In contrast, given the low VAF and high frequency of somatic DDX41 variants in DDX41 HHMs, serial high-depth sequencing of DDX41 for the common R525H mutation may be a preferred approach in DDX41 carriers-without HM. Although in the aging population, CH is a risk factor for leukemic transformation, the presence of CH in inherited bone marrow failure is in some situations associated with somatic rescue or normalization of HSPC fitness. Therefore, it is important to discriminate CH events that are associated with risk for leukemic transformation from CH, which results in normalization of function.58 Given that CH variants feasibly confer a step toward HM,36 the high frequency of BCOR and TET2 variants in our cohort of RUNX1 HHM malignancies and their presence in RUNX1 carriers-without HM, at least in the research study setting, warrant monitoring of these genes as potential molecular biomarkers of leukemogenesis. Changes in CH trajectory may eventually inform clinical decision-making, such as the timing of repeat bone marrow biopsies. These decisions will be made in conjunction with more classic clinical tools, such as the monitoring of peripheral blood cell counts.43
This study highlights the immense benefit of international collaboration and data sharing within the HHM and rare disease communities. We have established the framework for the continued accumulation of patient data, including longitudinal molecular monitoring, which is required to define the different risk states associated with leukemogenesis across these disorders. With continued progress, this work may lead to the establishment of a defined molecular risk stratification for leukemia progression in carriers and, with it, the ability to design and test in trials interventions to halt progression to full-blown HM in vulnerable HHM-variant carriers. For instance, with regular clinical surveillance, it may be possible to detect individuals who develop second hits in DDX41 or RUNX1 before a clinical diagnosis of HM. These individuals may benefit from intensive clinical surveillance or low-toxicity prophylactic therapies. In contrast, defining TET2, BCOR, or other epigenetic regulators as emerging vulnerabilities opens an avenue for the development of prophylactic treatments for HHM carriers via TET inhibitors, histone deacetylases (HDAC) inhibitors, hypomethylating agents, and combinatorial therapies that do not carry the morbidity and mortality of stem cell transplant. This study provides the most comprehensive investigation of leukemogenic molecular mechanisms in HHMs to date, informing the next generation of studies into the clinical management and surveillance of these disorders as well as potential insights into personalized and preemptive therapies for carriers.
Acknowledgments
The authors thank the patients and their family members for participating in this research program.
This work is supported by a grant from the RUNX1 Research Program. This project is also proudly supported by funding from the Leukemia Foundation of Australia and project grants APP1145278 and APP1164601 from the National Health and Medical Research Council of Australia. This work was produced with the financial and other support of Cancer Council SA's Beat Cancer Project on behalf of its donors and the State Government of South Australia through the Department of Health (PRF Fellowship to H.S.S.). This work was supported by a Damon Runyon Cancer Research Foundation Physician Scientist Training Award (M.W.D.), the Edward P. Evans Foundation Young Investigator Award (M.W.D.), the Cancer Research Foundation (M.W.D.), and a National Institutes of Health (NIH) K12 Paul Calabresi award (M.W.D.). P.A. was supported by a fellowship from The Hospital Research Foundation. Part of this project was undertaken while P.A. was holding a Royal Adelaide Hospital Mary Overton Early Career Fellowship. L.M is supported by the Associazione Italiana per la Ricerca sul Cancro (Accelerator Award Project 22796; 5x1000 Project 21267; Investigator Grant 2017 Project 20125). L.C.F and P.V. are supported by Maddie Riewoldt’s Vision. L.A.G was supported by the Cancer Research Foundation. K.Y. and P.L. are supported by the Division of Intramural Research, National Human Genome Research Institute, NIH. T.R. is supported by a grant of the European Hematology Association and Federal Ministry of Education and Research (BMBF) MyPred (01GM1911B). C.B. is supported by the European Union’s Horizon 2020 Research and Innovation Program under grant agreement number 739593 and by the Ministry of Innovation and Technology of Hungary from the National Research, Development and Innovation Fund, financed under the ED-18-1-2019 to 001, TKP2021-EGA-24 and TKP2021-NVA-15 funding schemes.
NIH Intramural Sequencing Center Comparative Sequencing Program was involved in the generation of sequencing data from the NIH.
Authorship
Contribution: C.C.H., M.W.D., and K.Y. were involved in all aspects of the project including designing the research, manuscript preparation, collecting next-generation sequencing (NGS) and clinical data, American College of Medical Genetics and Genomics (ACMG)-variant classification, somatic-variant analysis, and curation and analysis of the data; A.L.B., L.A.G., P.L., and H.S.S. designed the research, contributed NGS and clinical data, manuscript preparation, and ACMG-variant classification; J.F., L.A.-M., M.J.P., K.E.M., T.H, M.A., P.W., A.W.S., E.K., and R. Sood designed bioinformatic pipelines and analysis; D.M.L. designed VariantGrid software used for somatic and germ line variant curation (VariantGrid); P.V., P.A., S.L.K.-S., J.C., and C.N.H. curated somatic and germ line variant data; B.P. advised on statistical analysis; C.B., A.B.C., M.C., E.D., C.D.D., N.D., R.F., S.F., A.R.-M., B.P., J.M.K., A.K., M.K., J.L., N.V.M., G.N., C.O., K.P.P., C.P., H. Raslova., H. Rienhoff., T.R., R. Susman., K.T., E.V., E.K., R. Schulte., A.P.H., S.M.H., K.P., N.K.P., M.B., A.H.W., C.F., H.M.F., I.D.L., J.C., R. Sood, L.C.F., P.B., D.S., D.H., B.Y., L.M., A.L.B., and C.N.H. contributed NGS data, clinical patient information, and scientific insight; and all authors critically reviewed and approved the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
A complete list of the members of the NIH Intramural Sequencing Center Comparative Sequencing Program appears in “Appendix.”
Correspondence: Anna L. Brown, Department of Genetics and Molecular Pathology, SA Pathology, Frome Rd, Adelaide, SA 5000, Australia; e-mail: anna.brown@sa.gov.au.
References
Author notes
∗C.C.H., M.W.D., and K.Y. contributed equally to this manuscript.
Access to RUNX1 genomics data is available through the RUNX1 database (https://runx1db.runx1-fpd.org/). Original data may be obtained by email request to the corresponding author, Anna L. Brown (anna.brown@sa.gov.au). Access to additional deidentified genomics data is available on request.
The full-text version of this article contains a data supplement.