Key Points
Whole genome sequencing of 36 Hodgkin lymphoma (HL) families identifies 33 coding and 11 noncoding HL-risk variants.
Recurrent damaging variants are observed in known (KDR and KLHDC8B) and novel (PAX5, GATA3, and POLR1E) predisposing loci.
Abstract
Familial aggregation of Hodgkin lymphoma (HL) has been demonstrated in large population studies, pointing to genetic predisposition to this hematological malignancy. To understand the genetic variants associated with the development of HL, we performed whole genome sequencing on 234 individuals with and without HL from 36 pedigrees that had 2 or more first-degree relatives with HL. Our pedigree selection criteria also required at least 1 affected individual aged <21 years, with the median age at diagnosis of 21.98 years (3-55 years). Family-based segregation analysis was performed for the identification of coding and noncoding variants using linkage and filtering approaches. Using our tiered variant prioritization algorithm, we identified 44 HL-risk variants in 28 pedigrees, of which 33 are coding and 11 are noncoding. The top 4 recurrent risk variants are a coding variant in KDR (rs56302315), a 5′ untranslated region variant in KLHDC8B (rs387906223), a noncoding variant in an intron of PAX5 (rs147081110), and another noncoding variant in an intron of GATA3 (rs3824666). A newly identified splice variant in KDR (c.3849-2A>C) was observed for 1 pedigree, and high-confidence stop-gain variants affecting IRF7 (p.W238∗) and EEF2KMT (p.K116∗) were also observed. Multiple truncating variants in POLR1E were found in 3 independent pedigrees as well. Whereas KDR and KLHDC8B have previously been reported, PAX5, GATA3, IRF7, EEF2KMT, and POLR1E represent novel observations. Although there may be environmental factors influencing lymphomagenesis, we observed segregation of candidate germline variants likely to predispose HL in most of the pedigrees studied.
Introduction
Hodgkin lymphoma (HL) is a rare cancer of the lymph nodes that comprises ∼40% to 50% of all lymphomas with a unique distribution that differs geographically and ethnically.1 There is a known bimodal distribution of age at onset; both adolescents and young adults aged between 15 and 39 years and people aged >55 years are more affected.1 This signifies that there may be unique causes of disease at differing age of onset with the younger peak more likely to represent genetic predisposition than lifetime exposures.2 Known risk factors for the development of HL include age, male sex, higher socioeconomic status, smaller family size, living in westernized countries, and familial history of HL.1,3 A twofold to sixfold increased risk of developing HL has been reported for first-degree relatives of probands,4-7 which is among the highest reported for all cancers.4 A twin study demonstrated an increased risk for the development of HL in monozygotic twins but no increased risk for dizygotic twins,3 pointing toward a contribution by genetics vs environmental factors. Importantly, the 20 known cases of HL in monozygotic twins occurred before the age of 50 years (mean age, 25.5 years), in alignment with the hypothesis that genetic susceptibility to developing HL may be more prevalent among individuals diagnosed in the younger adolescents and young adults peak.
In contrast with other hematological malignancies, the pathogenetic mechanisms responsible for HL formation are largely unknown and few genomic aberrations have been described thus far.8-11 Genome-wide association studies (GWAS) of HL indicate a role for common genetic variation in the HLA region at 6p21.3212,13 and non-HLA loci including TCF3, REL, GATA3, and IL13.14,15 Whole exome sequencing (WES) studies from the National Cancer Institute (NCI) familial HL study 02-C-0210 identified rare coding variants segregating in families, including a missense variant in KDR16 segregating with disease in 2 pedigrees and 2 POT1 variants17 in separate families. Whole genome sequencing (WGS) identified a nonsynonymous variant in DICER1 and showed downregulation of tumor suppressor microRNAs in carriers.18,19 In addition, a homozygous 56–base pair deletion (c.2836_2892del) in ACAN was found to segregate in a Middle Eastern family.20 Furthermore, variants in KLHDC8B include a highly penetrant translocation between chromosomes 2 and 3 that removes the 5′ untranslated region (UTR) in one pedigree and a 5′ UTR single nucleotide variant (SNV) (rs387906223) found to segregate in 3 HL pedigrees, which reduces translation of KLHDC8B.21
To identify novel rare variants predisposing to HL susceptibility, we performed WGS of 36 pedigrees containing ≥2 individuals with HL, where at least 1 individual was diagnosed at the age of ≤21 years. This young age at onset is a novel approach selected to increase the probability of a genetic underpinning vs lifetime exposures for the development of HL. A subset of 23 pedigrees from NCI familial study 02-C-0210 meeting eligibility criteria were included in our study. This allowed for expansion of additional pedigree members, application of WGS, and interrogation of noncoding regions to build on previous WES efforts by Goldin, McMaster et al in the same pedigrees.16 The WGS data were analyzed with rigorous pipelines developed at St. Jude Children’s Research Hospital (SJCRH) to perform family-based segregation for the identification of coding, noncoding, and copy number variants (CNVs) using linkage and filtering approaches. Our findings and the data set amassed are from the largest cohort of familial HL with WGS to date.
Methods
Eligible study participants
We included patients diagnosed with HL at the age of ≤21 years with at least 1 first-degree relative also affected by HL regardless of age and all affected or unaffected relatives who were referred to SJCRH. The research participant or legal guardian provided informed consent for this protocol. In addition, DNA of individuals from 23 families from the Institutional Review Board–approved NCI study (NCT00039676)16,22 met the eligibility criteria and were included. The study was approved by the SJCRH IRB (NCT02795013, NCT03050268) and the Women's and Children's Health Network Human Research Ethics Committee (Adelaide, SA, Australia; 2020/HRE00981).
Sample preparation and sequencing
Samples were obtained from peripheral blood with DNA extraction per institutional protocol, and WGS was performed using a HudsonAlpha Sequencing Core. See supplementary Material for details, including quality control (QC), which are available on the Blood website.
Prioritization of coding and noncoding SNVs/indels
We developed the familial variant prioritizer (FAMVP) pipeline23 for annotation and prioritization of coding and noncoding SNVs and insertion/deletions (indels) from WGS and applied it to the 36 HL pedigrees. FAMVP checks for segregation of variants in affected individuals and obligate carriers within each pedigree were performed using SLIVAR.24 Penetrance was estimated using Bayes risk estimation (formula 2) in the study by Wang et al.25 Variants were not filtered out if they were present in unaffected individuals unless that individual was an unaffected spouse or if the estimated penetrance was negative (>50% of unaffected siblings were carriers). QC filters were applied as hard filtering thresholds (minimum genotype quality ≥ 20, minimum mean coverage depth ≥ 10, Hardy-Weinberg equilibrium P < 1 × 10−6, and maximum missing genotypes 10%).26-30 ANNOVAR31 was incorporated to functionally annotate variants. Splicing variants were defined as those within 4 nucleotides of an annotated splice junction and annotated based on predicted impact on splicing using SpliceAI32 and dbscSNV.33 The prioritization workflow used an investigator-derived candidate gene list comprising 160 cancer predisposition genes34-37 and 242 genes associated with HL; the HL-associated gene list was formulated from the Harmonizome38 database using the search term “Hodgkin,” the Cancer GeneticsWeb database,39 published literature, and 3 investigator-selected genes (supplemental Figure 1; supplemental Table 1).
Scoring of coding and noncoding variants
Coding variants were scored into 4 priority levels C1 to C4 defined based on presence within a predefined candidate gene, annotation as loss-of-function (LOF) (nonsense, frameshift, and splice), and predicted deleteriousness from REVEL26 and CADD27 using the developer-recommended thresholds of 0.5 and 20, respectively (Figure 1A). Noncoding variants were scored into 8 priority levels NC1 to NC8 based on overlap with DNase sequencing or transcription factor chromatin immunoprecipitation sequencing clusters from Encode340 with increased weight for the GM12878 cell line, RegBase30 model scores >10, noncoding RNA overlap, and proximity to genes on our candidate list.30,41 Variants with C1 to C3 and NC1 and NC2 priority level scores were further reviewed. FIMO42 from the meme-suite of tools (version 5.1.0) was used to predict differences in occurrences of known motifs (JASPAR2018_CORE_vertebrates_non-redundant) between reference and alternative alleles with ± 30 base pairs of context sequence.
Linkage and CNV analysis
Pedigree-specific multipoint parametric linkage analysis was performed using Merlin43 under an autosomal dominant inheritance model with 90% penetrance and an HL disease allele frequency of 0.00067 (ie, the prevalence of 219,128 HL cases in 2018 within the United States).44 Unaffected siblings were coded as unknown. Pedigree-specific maximum MLOD regions were defined. Segregating coding and noncoding variants falling within the 1-LOD support interval of the MLOD are indicated in Table 1.
Family . | Sample count . | Affected plus obligate carriers . | Early-onset family . | Within 1-LOD MLOD region . | ACMG evidence . | Pathogenicity . | Priority . | Gene . | Type . | HGVS . |
---|---|---|---|---|---|---|---|---|---|---|
HL1000001 | 4 | 2 | PVS1∗, PP1∗ | LP∗, VUS | C3 | MST1R | Coding | NM_002447.4:c.697delinsCA (p.V233Cfs∗16) | ||
HL1000003 | 4 | 2 | PM2_supporting, PP1_moderate∗, PP3 | VUS, VUS∗ | NC1 | PAX5 | Noncoding | NM_016734.3:c.605-2310C>T | ||
HL1000007 | 4 | 2 | PVS1∗, PP1∗ | LP∗, VUS | C3 | GPNMB | Coding | NM_002510.3:c.367+2T>C | ||
HL1000007 | 4 | 2 | PVS1∗, PP1∗ | LP∗, VUS | C3 | BLK | Coding | NM_001715.3:c.369-2A>G | ||
HL1000056 | 4 | 3 | BP4, PP1∗ | VUS∗, VUS | C2 | REL | Coding | NM_001291746.2:c.920A>G (p.H307R) | ||
HL1000059 | 4 | 3 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | IGSF10 | Coding | NM_178822.5:c.3296C>G (p.S1099∗) | ||
HL1000059 | 4 | 3 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | GSN | Coding | NM_198252.3:c.1662G>A (p.W554∗) | ||
HL1000060 | 4 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | CARD9 | Coding | NM_052813.5:c.184+1G>A | ||
HL1000060 | 4 | 2 | BP4, PP1∗ | VUS∗, VUS | C3 | MROH2A | Coding | NM_001394639.1:c.4452+1G>A | ||
HL1000060 | 4 | 2 | PVS1∗, PP1∗ | LP∗, VUS | C3 | ACOT8 | Coding | NM_005469.4:c.488+1G>A | ||
HL1000060 | 4 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | ZNF683 | Coding | NM_001114759.3:c.103C>T (p.R35∗) | ||
HL1000063 | 3 | 2 | PS3_supporting, PP1_strong∗, PP3 | LP∗, VUS | NC1 | KLHDC8B∗ | Noncoding | NM_173546.3:c.-158C>T | ||
HL1000064 | 5 | 2 | PVS1_moderate, PM2_supporting, PP1 | VUS | C1 | KDR∗ | Coding | NM_002253.4:c.3849-2A>C | ||
HL1000064 | 5 | 2 | PP1∗, PP3 | VUS∗, VUS | NC1 | RUNX3 | Noncoding | NM_004350.3:c.-61138C>T | ||
HL1000065 | 7 | 3 | PP1∗, PP3 | VUS∗, VUS | NC1 | MYB | Noncoding | NM_001130173.2:c.-4939del | ||
HL1000065 | 7 | 3 | PP1∗, PP3 | VUS∗, VUS | NC1 | ATF3 | Noncoding | NM_001674.4:c.-5+978A>C | ||
HL1000078 | 5 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | POLR1E | Coding | NM_022490.4:c.847del (p.L283Sfs∗9) | ||
HL1000078 | 5 | 2 | PM2_supporting, PP1∗, BP4 | VUS∗, VUS | C1 | JUNB | Coding | NM_002229.3:c.334C>T (p.P112S) | ||
HL16594 | 6 | 2 | X | BP4, PP1∗ | VUS∗, VUS | C1 | TCF3 | Coding | NM_003200.5:c.1486G>A (p.E496K) | |
HL16594 | 6 | 2 | X | BP4, PP1∗ | VUS∗, VUS | C2 | HBS1L | Coding | NM_006620.4:c.1162A>G (p.I388V) | |
HL213 | 6 | 2 | X | PM2_supporting, PP1∗, BP4 | VUS∗, VUS | C1 | RAD51D | Coding | NM_002878.4:c.911G>A (p.G304D) | |
HL2350 | 6 | 4 | X | PS3_supporting, PP1_strong∗ | VUS∗, VUS | C1 | KDR∗ | Coding | NM_002253.4:c.3193G>A (p.A1065T) | |
HL2408 | 4 | 2 | PM2_supporting, PP1∗, PP3 | VUS∗, VUS | C3 | CDT1 | Coding | NM_030928.4::c.1477+3_1477+24del () | ||
HL2408 | 4 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | POLR1E | Coding | NM_022490.4:c.445C>T (p.R149∗) | ||
HL2491 | 6 | 2 | PS3_supporting, PP1_strong∗, PP3 | LP∗, VUS | NC1 | KLHDC8B∗ | Noncoding | NM_173546.3:c.-158C>T | ||
HL2491 | 6 | 2 | PP1∗, PP3 | VUS∗, VUS | NC1 | GATA3 | Noncoding | NM_001002295.2:c.779-2563G>A | ||
HL2694 | 11 | 4 | X | PVS1∗, PM2_supporting, PP1_moderate∗ | P∗, VUS | C3 | EEF2KMT | Coding | NM_201400.4:c.529A>T (p.K177∗) | |
HL2696 | 5 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | EFR3B | Coding | NM_014971.2:c.853C>T (p.Q285∗) | ||
HL3262 | 7 | 3 | PM2_supporting, PP1∗, PP3 | VUS∗, VUS | C3 | EIF1AD | Coding | NM_001242481.2:c.88-4C>G | ||
HL3262 | 7 | 3 | PS3_supporting, PP1_strong∗ | VUS∗, VUS | C1 | KDR∗ | Coding | NM_002253.4:c.3193G>A (p.A1065T) | ||
HL3262 | 7 | 3 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | DDX10 | Coding | NM_004398.4:c.2059A>T (p.K687∗) | ||
HL3350 | 9 | 2 | X | PM2_supporting, PP1∗, PP2 | VUS∗, VUS | C1 | STAT3 | Coding | NM_139276.3:c.849A>T (p.K283N) | |
HL3402 | 8 | 4 | X | PVS1∗, PM2_supporting, PP1_moderate∗ | P∗, VUS | C1 | IRF7 | Coding | NM_001572.5:c.396G>A (p.W132∗) | |
HL3929 | 3 | 3 | PP1∗, PP3 | VUS∗, VUS | NC1 | GATA3 | Noncoding | NM_001002295.2:c.779-2563G>A | ||
HL3929 | 3 | 3 | PS3_supporting, PP1_moderate, PP3 | VUS∗, VUS | C1 | POT1∗ | Coding | NM_015450.3:c.670G>A (p.D224N) | ||
HL4450 | 7 | 5 | X | PS3_supporting, PP1_strong∗, PP3 | LP∗, VUS | NC1 | KLHDC8B∗ | Noncoding | NM_173546.3:c.-158C>T | |
HL4479 | 7 | 2 | BP4, PP1∗ | VUS∗, VUS | C1 | BAD | Coding | NM_032989.3:c.397A>C (p.K133Q) | ||
HL4479 | 7 | 2 | BP4, PP1∗ | VUS∗, VUS | C1 | CLEC16A | Coding | NM_015226.3:c.2578C>T (p.R860C) | ||
HL4643 | 18 | 5 | PP1∗, PP3 | VUS∗, VUS | NC1 | IRF8 | Noncoding | NM_002163.4:c.-1-1639C>T | ||
HL5171 | 5 | 2 | PP1∗ | VUS∗, VUS | C2 | MET | Coding | NM_000245.4:c.2318C>T (p.P773L) | ||
HL5171 | 5 | 2 | BP4, PP1∗ | VUS∗, VUS | C1 | MAP3K7 | Coding | NM_145331.3:c.1282G>A (p.V428I) | ||
HL533 | 6 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | ARMC9 | Coding | NM_001352754.2:c.1268_1271del (p.K423Rfs∗29) | ||
HL6898 | 7 | 3 | X | PM2_supporting, PP1_moderate∗, PP3 | VUS, VUS∗ | NC1 | PAX5 | Noncoding | NM_016734.3:c.605-2310C>T | |
HL696 | 7 | 2 | X | PVS1∗, PP1_moderate∗ | P∗, VUS | C3 | TPRG1 | Coding | NM_198485.4:c.183_192del (p.Y61∗) |
Family . | Sample count . | Affected plus obligate carriers . | Early-onset family . | Within 1-LOD MLOD region . | ACMG evidence . | Pathogenicity . | Priority . | Gene . | Type . | HGVS . |
---|---|---|---|---|---|---|---|---|---|---|
HL1000001 | 4 | 2 | PVS1∗, PP1∗ | LP∗, VUS | C3 | MST1R | Coding | NM_002447.4:c.697delinsCA (p.V233Cfs∗16) | ||
HL1000003 | 4 | 2 | PM2_supporting, PP1_moderate∗, PP3 | VUS, VUS∗ | NC1 | PAX5 | Noncoding | NM_016734.3:c.605-2310C>T | ||
HL1000007 | 4 | 2 | PVS1∗, PP1∗ | LP∗, VUS | C3 | GPNMB | Coding | NM_002510.3:c.367+2T>C | ||
HL1000007 | 4 | 2 | PVS1∗, PP1∗ | LP∗, VUS | C3 | BLK | Coding | NM_001715.3:c.369-2A>G | ||
HL1000056 | 4 | 3 | BP4, PP1∗ | VUS∗, VUS | C2 | REL | Coding | NM_001291746.2:c.920A>G (p.H307R) | ||
HL1000059 | 4 | 3 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | IGSF10 | Coding | NM_178822.5:c.3296C>G (p.S1099∗) | ||
HL1000059 | 4 | 3 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | GSN | Coding | NM_198252.3:c.1662G>A (p.W554∗) | ||
HL1000060 | 4 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | CARD9 | Coding | NM_052813.5:c.184+1G>A | ||
HL1000060 | 4 | 2 | BP4, PP1∗ | VUS∗, VUS | C3 | MROH2A | Coding | NM_001394639.1:c.4452+1G>A | ||
HL1000060 | 4 | 2 | PVS1∗, PP1∗ | LP∗, VUS | C3 | ACOT8 | Coding | NM_005469.4:c.488+1G>A | ||
HL1000060 | 4 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | ZNF683 | Coding | NM_001114759.3:c.103C>T (p.R35∗) | ||
HL1000063 | 3 | 2 | PS3_supporting, PP1_strong∗, PP3 | LP∗, VUS | NC1 | KLHDC8B∗ | Noncoding | NM_173546.3:c.-158C>T | ||
HL1000064 | 5 | 2 | PVS1_moderate, PM2_supporting, PP1 | VUS | C1 | KDR∗ | Coding | NM_002253.4:c.3849-2A>C | ||
HL1000064 | 5 | 2 | PP1∗, PP3 | VUS∗, VUS | NC1 | RUNX3 | Noncoding | NM_004350.3:c.-61138C>T | ||
HL1000065 | 7 | 3 | PP1∗, PP3 | VUS∗, VUS | NC1 | MYB | Noncoding | NM_001130173.2:c.-4939del | ||
HL1000065 | 7 | 3 | PP1∗, PP3 | VUS∗, VUS | NC1 | ATF3 | Noncoding | NM_001674.4:c.-5+978A>C | ||
HL1000078 | 5 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | POLR1E | Coding | NM_022490.4:c.847del (p.L283Sfs∗9) | ||
HL1000078 | 5 | 2 | PM2_supporting, PP1∗, BP4 | VUS∗, VUS | C1 | JUNB | Coding | NM_002229.3:c.334C>T (p.P112S) | ||
HL16594 | 6 | 2 | X | BP4, PP1∗ | VUS∗, VUS | C1 | TCF3 | Coding | NM_003200.5:c.1486G>A (p.E496K) | |
HL16594 | 6 | 2 | X | BP4, PP1∗ | VUS∗, VUS | C2 | HBS1L | Coding | NM_006620.4:c.1162A>G (p.I388V) | |
HL213 | 6 | 2 | X | PM2_supporting, PP1∗, BP4 | VUS∗, VUS | C1 | RAD51D | Coding | NM_002878.4:c.911G>A (p.G304D) | |
HL2350 | 6 | 4 | X | PS3_supporting, PP1_strong∗ | VUS∗, VUS | C1 | KDR∗ | Coding | NM_002253.4:c.3193G>A (p.A1065T) | |
HL2408 | 4 | 2 | PM2_supporting, PP1∗, PP3 | VUS∗, VUS | C3 | CDT1 | Coding | NM_030928.4::c.1477+3_1477+24del () | ||
HL2408 | 4 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | POLR1E | Coding | NM_022490.4:c.445C>T (p.R149∗) | ||
HL2491 | 6 | 2 | PS3_supporting, PP1_strong∗, PP3 | LP∗, VUS | NC1 | KLHDC8B∗ | Noncoding | NM_173546.3:c.-158C>T | ||
HL2491 | 6 | 2 | PP1∗, PP3 | VUS∗, VUS | NC1 | GATA3 | Noncoding | NM_001002295.2:c.779-2563G>A | ||
HL2694 | 11 | 4 | X | PVS1∗, PM2_supporting, PP1_moderate∗ | P∗, VUS | C3 | EEF2KMT | Coding | NM_201400.4:c.529A>T (p.K177∗) | |
HL2696 | 5 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | EFR3B | Coding | NM_014971.2:c.853C>T (p.Q285∗) | ||
HL3262 | 7 | 3 | PM2_supporting, PP1∗, PP3 | VUS∗, VUS | C3 | EIF1AD | Coding | NM_001242481.2:c.88-4C>G | ||
HL3262 | 7 | 3 | PS3_supporting, PP1_strong∗ | VUS∗, VUS | C1 | KDR∗ | Coding | NM_002253.4:c.3193G>A (p.A1065T) | ||
HL3262 | 7 | 3 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | DDX10 | Coding | NM_004398.4:c.2059A>T (p.K687∗) | ||
HL3350 | 9 | 2 | X | PM2_supporting, PP1∗, PP2 | VUS∗, VUS | C1 | STAT3 | Coding | NM_139276.3:c.849A>T (p.K283N) | |
HL3402 | 8 | 4 | X | PVS1∗, PM2_supporting, PP1_moderate∗ | P∗, VUS | C1 | IRF7 | Coding | NM_001572.5:c.396G>A (p.W132∗) | |
HL3929 | 3 | 3 | PP1∗, PP3 | VUS∗, VUS | NC1 | GATA3 | Noncoding | NM_001002295.2:c.779-2563G>A | ||
HL3929 | 3 | 3 | PS3_supporting, PP1_moderate, PP3 | VUS∗, VUS | C1 | POT1∗ | Coding | NM_015450.3:c.670G>A (p.D224N) | ||
HL4450 | 7 | 5 | X | PS3_supporting, PP1_strong∗, PP3 | LP∗, VUS | NC1 | KLHDC8B∗ | Noncoding | NM_173546.3:c.-158C>T | |
HL4479 | 7 | 2 | BP4, PP1∗ | VUS∗, VUS | C1 | BAD | Coding | NM_032989.3:c.397A>C (p.K133Q) | ||
HL4479 | 7 | 2 | BP4, PP1∗ | VUS∗, VUS | C1 | CLEC16A | Coding | NM_015226.3:c.2578C>T (p.R860C) | ||
HL4643 | 18 | 5 | PP1∗, PP3 | VUS∗, VUS | NC1 | IRF8 | Noncoding | NM_002163.4:c.-1-1639C>T | ||
HL5171 | 5 | 2 | PP1∗ | VUS∗, VUS | C2 | MET | Coding | NM_000245.4:c.2318C>T (p.P773L) | ||
HL5171 | 5 | 2 | BP4, PP1∗ | VUS∗, VUS | C1 | MAP3K7 | Coding | NM_145331.3:c.1282G>A (p.V428I) | ||
HL533 | 6 | 2 | PVS1∗, PM2_supporting, PP1∗ | P∗, VUS | C3 | ARMC9 | Coding | NM_001352754.2:c.1268_1271del (p.K423Rfs∗29) | ||
HL6898 | 7 | 3 | X | PM2_supporting, PP1_moderate∗, PP3 | VUS, VUS∗ | NC1 | PAX5 | Noncoding | NM_016734.3:c.605-2310C>T | |
HL696 | 7 | 2 | X | PVS1∗, PP1_moderate∗ | P∗, VUS | C3 | TPRG1 | Coding | NM_198485.4:c.183_192del (p.Y61∗) |
Prioritized variants for each family based on segregation and predicted deleteriousness. Family specific information including sample count and number of affected and obligate carriers and counts for each priority level are provided in addition to highlighted variant details. Variants in KLHDC8B NM_173546.3:c.-158C>T, KDR NM_002253.4:c.3849-2A>C, and POT1 were previously reported based on WES analysis of the same NCI pedigrees.16,17,21
BP, benign supporting; HGVS, Human Genome Variation Society; LP, likely pathogenic; P, pathogenic; PM, pathogenic moderate; PP, pathogenic supporting; PS, pathogenic strong; PVS, pathogenic very strong; VUS, variant of unknown significance.
∗Where evidence applied based on hypothesis of genotype-phenotype correlation.
CNVnator45 was used to determine CNV genotypes for each sample. CNV calls not supported by at least 1 discordant read pair alignment were removed. SURVIVOR46 was used to merge individual sample variant call formats into pedigree-specific variant call formats and SLIVAR24 for the identification of segregating CNVs within each pedigree. We then used AnnotSV47 to automatically apply CNV-specific ranking criteria based on the ACMG and ClinGen guidelines. CNVs with ranking ≥4 (P/LP) were reviewed for their potential to be related to the HL phenotype.
Candidate variant review and classification
Coding and noncoding variant prioritization scores, linkage results, literature, and expert opinion were considered when reviewing SNV/indel candidate variants within each family (Figure 1A). This was restricted to C1 to C3 and NC1 and NC2 variants. C4 variants are low scoring in genes that have little to no previous association with HL or related phenotypes and therefore were not included. Final candidate variants were then classified based on recommendations from the ACMG, AMP, and Clinical Genome Resource. Pathogenicity was calculated using the Bayesian formulation of the ACMG/AMP.48 Refer to supplementary material for further details and thresholds.48
Results
We performed comprehensive WGS analysis on germline samples of 234 individuals (129 female and 105 male) from 36 unrelated HL pedigrees (Figure 1A and Figure 2). The cohort consisted of 79 affected and 155 unaffected individuals, with an average of 2.6 affected individuals per pedigree (range, 2-5). The median age at diagnosis was 21.98 years (range, 3-55 years). Most individuals (207, 88.5%) were of European ancestry (supplemental Figure 2).
Prioritized HL-risk segregating coding and noncoding variants
On average, families carried 68 segregating C1 to C4 coding variants and 55 segregating NC1 and NC2 noncoding variants (supplemental Table 2). Of these, there were 70 high-priority coding variants (C1 and C2; Figure 1B) and 1769 high-priority noncoding variants (NC1 and NC2; Figure 1C) based on their predicted functional consequences. These prioritized coding and noncoding risk variants (C1-C2 plus C3 and NC1 and NC2) were then subjected to an iterative multireviewer process to narrow down to a list (Table 1; supplemental Table 3) of prioritized genomic candidates for 28 of the 36 pedigrees (Figure 1D-E).
Recurrence among pedigrees
Four recurrent variants were observed (Table 1), including previously reported coding variant (rs56302315) in KDR16 and 5′ UTR variant (rs387906223) in KLHDC8B,21 along with 2 novel noncoding variants, 1 in intron 5 of PAX5 (rs147081110) and another (rs3824666) in intron 3 of GATA3 (supplemental Table 4). The PAX5 intronic variant (rs147081110, RegBase PHRED = 10.299, minor allele frequency = 3.94 × 10-5) overlaps DNase I along with transcription factor binding site (TFBS) clusters with predicted loss of XBP1 binding (Figure 3A). In addition, this variant fell within one of the maximum MLOD regions for pedigree HL6898 (MLOD = 0.30; HG38, chr9:36,833,268-37,034,267), with transmission from the affected mother to both affected children. The GATA3 intronic variant (rs3824666, RegBase PHRED = 15.39, minor allele frequency = 0.0079) overlaps DNase I and TFBS clusters with predicted loss of binding of TCF3 and TCF12 (Figure 3B). TCF3 is purported to act as a tumor suppressor in B-cell malignancies.49,TCF12 (HEB), a critical regulator of hematopoietic cell specification, has been shown to be necessary for proper B-cell and CD4 T-cell generation.50,51 We identified an additional 25 recurrent NC1 and NC2 variants (supplemental Table 4). There were 11 variants with no predicted change in transcription factor binding and 15 with complex predictions by FIMO. Although showing complex predicted binding, the recurrent variant (rs575404240) in IRF8 is of interest because it was the final prioritized candidate for HL4643, one of the largest pedigrees studied.
Gene-level recurrence of variants
Gene-level recurrence, wherein multiple unrelated families harbor different variants in the same gene, was seen for POLR1E (p.L283Sfs∗9, p.R149∗, and p.E41Dfs∗2) in 3 independent pedigrees. Although not in our candidate gene list, these POLR1E LOF variants were of interest because intrachromosomal rearrangements involving this gene have been identified in diffuse large B-cell lymphomas.52 In addition, a splice variant in KDR (c.3849-2A>C) was identified in 1 pedigree, meaning we observed gene-level recurrence of variations in KDR across 3 pedigrees. Although KDR16 and KLHDC8B21 have previously been reported, PAX5, GATA3, and POLR1E represent novel findings.
Variants under maximum MLOD peaks
Prioritized variants in 5 genes (KDR, IRF7, EEF2KMT, KLHDC8B, and PAX5) were of increased interest given they fell under maximum MLOD peaks. KDR,16 KLHDC8B,21 and PAX5 have already been described in “Recurrence among pedigrees.” Two stop-gain variants (IRF7 p.W238∗ and EEF2KMT p.K77∗) were found to segregate within large pedigrees with 4 affected or obligate carriers and fell within or close to linkage peaks with LOD > 0.70. IRF7 p.W238∗ was the only C1 variant prioritized for HL3402 (Figure 4A-C) and showed segregation among 4 affected individuals across 2 generations. It has been classified using the ACMG criteria by InterVar as LP and resides in the interferon regulatory factor DNA-binding domain in the 5′ N-terminal region of the gene. In addition, EEF2KMT p.K77∗ is a C3 variant prioritized for HL2694, a pedigree for which no C1 or C2 variant (ie, in a candidate gene) was identified (Figure 4D-E).
Segregating CNVs
An unbiased genome-wide screen for CNV identified 26 CNVs that segregated with affected individuals within pedigrees and were potentially P/LP based on the AnnotSV ranking (supplemental Table 5A). All 26 CNVs were observed to be common in an internal control cohort of >10 000 samples that were analyzed in a similar manner, and a majority (15/26 CNVs) also overlapped an established benign CNV region. In addition, none of the 26 CNVs overlapped a known candidate gene. We also observed 9 VUS-ranked CNVs that were annotated as affecting a candidate gene; however, only 2 were considered not common (supplemental Table 5B). These 2 rare VUS CNVs (HL1000064-CMIP and HL213-PTPRK) were intronic and shared by most unaffected individuals in each pedigree. Based on the described CNV selection criteria, we determined that none of the segregating CNVs were likely to be associated with the development of HL.
Pedigrees with very early-onset HL
Four pedigrees had a proband with an onset of HL<10 years of age, which is even more rare for this disease that has an incidence rate of 4.2% in age <10 years compared with 46.6% in ages 10 to 19 years.44 Thus, we interrogated all variants in tiers C1 to C4 for this subset of pedigrees. Pedigree HL213 had the proband with the earliest onset in the cohort (aged 3 years) and an affected father (onset at 31 years). We observed 15 coding variants for pedigree HL213 with only a single variant (rs200615280, p.G304D) in RAD51D, 1 of our candidate genes involved in DNA repair with a well-established role in breast and ovarian cancer.53 An independent study of familial HL19 found another segregating variant (rs587781813, p.R266C) in RAD51D. Both variants reside in the same RecA-like nucleoside-triphosphatase protein domain. Using our classification criteria in the context of a potential association with HL, RAD51D p.G304D is classified as VUS based on ACMG PM2, PP1, and BP4 criteria and RAD51D p.R266C as VUS.
The next youngest proband belonged to pedigree HL3350 (onset at 6 years). Their father was also affected but at a later age (onset at 51 years). There were 49 coding variants observed for pedigree HL3350, 2 of which affected study-specific candidate genes, STAT3 (p.K283N) and TP63 (rs201188464, p.P174L). STAT3 has a clear body of literature supporting its role in HL54; thus, it was prioritized over TP63.
Pedigree HL696 has a proband with an onset age of 7 years whose father had an onset at age 31 years. This pedigree also includes an affected great uncle of the proband (onset at 43 years) for whom sequencing data were not available. We observed 78 segregating coding variants for pedigree HL696, none of which affected study-specific candidate genes. However, 1 of the variants is a stop gain in TPRG1 (rs761733372, p.Y61∗), a gene whose overexpression has been associated with B-cell lymphoma.55 There were 2 affected siblings in the fourth pedigree (HL16594), one of whom was diagnosed at 10 years of age. Out of the 131 coding variants observed for this pedigree, there were 5 in candidate genes. A missense variant (rs201250905, p.E496K) affecting TCF3, a gene associated with HL risk, may be relevant in this pedigree.56 In addition, a stop-gain variant (rs758125506, p.K461∗) was observed for this pedigree in GBP5, a gene associated with chronic active Epstein-Barr virus (EBV) infection. We did not prioritize this variant given that its LOF annotation does not support an association with lymphomagenesis, which would be predicted to occur through increased expression or a gain-of-function variant in this gene.57
Discussion
HL is a rare cancer with known familial aggregation but limited understanding of its genetic predisposition. We performed WGS on the largest cohort of pedigrees with multiple occurrences of HL, thus expanding upon previous WES studies to comprehensively interrogate both coding and noncoding variation. To identify potentially causative or disease-susceptibility variants, we considered the following categories: (1) variants identified from large pedigrees with 4 or more affected relatives, particularly if they fell within linkage peaks; (2) recurrent noncoding variants segregating with HL in >1 pedigree; (3) gene-level recurrence under the assumption of genetic heterogeneity; and (4) variants prioritized in very early-onset families (onset of HL at <10 years of age).44
Findings from the largest pedigrees
Five high-confidence variants were found in each of the largest pedigrees (N = 4-5 affected relatives). Two of the variants had previously been identified by WES in the same families: KDR p.A1065T (HL2350) and KLHDC8B c.-1108C>T (HL4450)16,21,58. The remaining 3 variants (IRF7:p.W238∗; EEF2KMT:p.K116∗; rs575404240 in IRF8) are novel candidates that deserve further investigation. IRF7 plays important roles in innate immunity and immune cell differentiation and is involved in the regulation of EBV latency.59 This variant may affect the clearance of EBV in a host, which is a driver for the formation of lymphoma and may be causal in this pedigree.60 Further analyses to look for latent EBV in this pedigree would be helpful to establish causation. EEF2KMT was the best candidate for pedigree HL2694, which had no C1 or C2 variants in cancer-related candidate genes. However, the role of this gene in lymphomagenesis is unclear. IRF8 is an interferon regulatory factor that plays a role in cellular differentiation, transformation, and apoptosis.61 This gene is expressed in B cells and has been implicated in the formation of chronic lymphocytic leukemia, another B-cell malignancy. The expression of IRF8 is important for B-cell development, including pre–B-cell differentiation, and variations in this gene may be implicated in the formation of HL.61,62 Therefore, these 3 new variants are potentially pathogenic in the largest of our pedigrees and in need of further validation.
Findings in noncoding variants
WGS allowed for the identification of segregating noncoding variants that may be important for HL, particularly in pedigrees for which no obvious coding variant was found. We identified recurrent noncoding variants in PAX5 and GATA3, both well-known HL candidate genes. Predicting and testing the impact of noncoding variants can be complex depending on the tissue specificity and temporal stage of sample collection.63 The recurrent intronic GATA3 variant (rs3824666) is especially interesting because it is predicted to lead to loss of binding of tumor suppressor TCF349 and it corroborates the results from previous GWAS studies.56 Genome-wide association of single nucleotide polymorphisms within GATA3 (rs3781093, P = 9.49 × 10−13) with HL was found in a large meta-analysis of 5314 HL cases and 16 749 controls.14 In addition, meta-analysis in 1816 HL cases and 7877 controls and subsequent replication in an independent set of 1281 HL cases and 3218 controls found a significant association of common noncoding variants (rs444929) in GATA3.56 Thus, we speculate that the common and low-frequency noncoding variants in GATA3 may be related to both sporadic and familial HL. The other interesting noncoding variant was found in PAX5, which is of significant clinical interest given its association with other cancers such as leukemia and lymphoblastic lymphoma. PAX5 is needed for B-cell lineage commitment64 and given that HL is a B-cell driven process that loses its B-cell marker and aberrantly expresses CD30, it is plausible that a genetic variant in PAX5 may hinder B cells from remaining committed to the B-cell lineage and thus have a propensity to lose their classic CD20 marker.64 Moreover, the expression of PAX5 is turned on during the transition of B cells from a pre–pro-B cell to a committed pro–B cell, and in mouse models, when PAX5 is turned off, the cells return to an uncommitted progenitor cell.64 Therefore, alterations in PAX5 may explain the loss of CD20 and be related to the formation of Hodgkin Reed-Sternberg cells.
Gene-level recurrence
Recurrence at the gene level was found for POLR1E and KDR. The POLR1E gene is a polymerase (RNA) I polypeptide E programmed death ligand. Intrachromosomal alterations involving POLR1E have been described in diffuse large B-cell lymphoma tissue specimens.52,POLR1E was not on our candidate gene list but we observed 3 different segregating LOF variants in 3 independent pedigrees, which was a striking finding. Besides the recurrent published KDR coding variant, we identified a novel KDR splice variant (c.3849-2A>C) in pedigree HL1000064 that affects the splice acceptor locus before exon 30 (last exon). This splice variant is predicted to disrupt the native acceptor site. KDR, also known as VEGFR2, has been reported for its association with familial HL and validated with functional experiments.16 The KDR gene comprises 30 exons and Rotunno et al16 demonstrated that this missense variant affects the kinase domain activation loop that is important for tumor angiogenesis and cell proliferation and survival. VEGFR’s importance in angiogenesis of HL has been well described.65 This new finding of a splice variant adds to growing evidence that variants affecting KDR have a role in HL susceptibility, and the addition of a second KDR variant in familial HL suggests that this gene could be considered for clinical cancer predisposition screenings.
Variants potentially related to early onset
We were particularly interested in germline variants segregating in 4 early-onset families and hypothesized that germline predisposition may be more relevant in such families because of the presumed higher impact of genetic over environmental causes. This investigation resulted in a few candidates, including RAD51D, STAT3, TPRG1, and TCF3. However, no clear primary candidate emerged from this analysis, with the exception of RAD15D, which has also been implicated in an independent familial HL cohort.19 RAD51D is part of the homologous recombination deficiency pathway and interacts with BRCA1/2. Burden testing of germline variants in this gene was also associated with an increased risk for ovarian cancer.66 The paper reported that ∼69% of RAD51D variants contributing to ovarian cancer fell in the DNA recombination and repair protein RecA-like adenosine triphosphate–binding domain of the gene. The variant segregating with HL in our family falls within the same domain. In contrast to most nonsense and frameshift variants in RAD15D seen in patients with ovarian cancer, the HL-segregating variants from our study (p.G304D) and an independent HL familial study19 (p.R266C) are both missense variants. Somewhat surprisingly, RAD15D in HL213 segregated in 1 but not both monozygotic twins. We hypothesize that the germline susceptibility from RAD15D may be accompanied by environmentally triggered epigenetic factors including DNA methylation, which has been reported as a cause for disease in other discordant twin studies, including in HL.67-69
Strengths of this study include the highly informative pedigrees with multiple affected individuals, at least 1 proband diagnosed ≤21 years of age, and application of WGS allowed for interrogation of noncoding variants that had not previously been performed, even in the NCI families. In addition, we used several newly available annotations and computational tools for family-based screening of variants, including in-house pipelines for variant calling. Our findings validated previously published results, which served as positive controls.
Limitations include lack of updated clinical information after enrollment and reliance on unaffected married-in spouses as negative controls. Next, we based our interrogation on germline variants only and did not study the contribution of somatic mutations to the patient’s disease because few tumor blocks were available. Viral factors or cooperating mutations may also contribute, but the EBV status of the patients and tumor were unknown. Of note, we selected probands diagnosed with HL <21 years of age. These patients fit within the first bimodal age at onset peak, which has the lowest rate of association with EBV. Another limitation is the use of the GM12878 cell line for noncoding variant prioritization based on DNase I hypersensitivity and TFBS overlap, which may not be the optimal cell line for studying HL but has been used previously in GWAS studies of HL.14
Conclusions
We used WGS in 36 highly informative HL pedigrees with pediatric age probands and expanded on previous WES analysis. In the next step, these candidate risk variants should be validated for pathogenicity in a laboratory setting, in large sporadic cohorts of HL cases, and through sequencing of limited tumor samples available to compare the germline genetic changes identified with those seen in Hodgkin Reed-Sternberg cells. The genomic landscape of familial HL remains incompletely characterized. Identification of genetic predisposition variants for the development of HL may lead to novel therapeutic targets, better treatment of this rare disease, and addition of these genes to clinical germline genetic testing panels to facilitate early detection of symptoms, inform genetic counseling, and help determine risk for other family members.
Acknowledgments
The authors thank Bryan Roberts for his assistance with drawing of pedigrees in Figure 2 and to members of the St. Jude Children’s Research Hospital’s Center for Applied Bioinformatics for pipeline implementation and computing.
This work was supported by grants from the National Institutes of Health R03 grant (R03HD104066), National Institutes of Health Cancer Support Core grant (CA-21765), Lymphoma Research Foundation, the American Lebanese Syrian Associated Charities, and Gabriella Miller Kids First X01 grant (HL136999-01). The research activities of L.R.G., M.L. McMaster, M.R., N.C., A.V., D.F., K.W., J.L., and M.T. were supported by the Intramural Research Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute. Work in Australia was supported by National Health and Medical Research Council grant APP1164601.
Authorship
Contribution: J.E.F., J.L.M., and K.H. performed the clinical data preparation; J.R.M., J.E.F., J.L.M., N.O., S.R.R., T.-C.C., S.S.T., and E.R. performed the genomic analyses; and J.E.F., J.R.M., and E.R. prepared the manuscript with contributions from J.L.M., N.O., J.J.Y., Y.H., Y.-D.W., W.C., G.W., L.R.G., M.L. McMaster, M.R., N.C., A.V., D.F., K.W., J.L., M.T., C.H., A.L.B., H.S.S., C.M., K.E.N., and M.L. Metzger.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Jamie E. Flerlage, Department of Oncology, St. Jude Children’s Research Hospital, 262 Danny Thomas Pl, Memphis, TN 38105; e-mail: jamie.flerlage@stjude.org; Jun J. Yang, Hematological Malignancies Program, Comprehensive Cancer Center, St. Jude Children’s Research Hospital, 262 Danny Thomas Pl, Memphis, TN 38105; e-mail: jun.yang@stjude.org; and Evadnie Rampersaud, Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, 262 Danny Thomas Pl, Memphis, TN 38105; e-mail: evadnie.rampersaud@stjude.org.
References
Author notes
∗J.E.F., J.R.M., J.J.Y., and E.R. contributed equally to this study.
The results are based upon data generated by Gabriella Miller Kids First Pediatric Research Program projects phs001738.v1.p1 and accessed from the Kids First Data Resource Portal (https://kidsfirstdrc.org) and/or the database of Genotypes and Phenotypes (www.ncbi.nlm.nih.gov/gap). Code used for this publication is available at https://github.com/jrm3215/FAMVP.
The online version of this article contains a data supplement.
There is a Blood Commentary on this article in this issue.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal