Key Points
LAPTM5c403t and HCLS1g496a are potentially novel contributors for the genetic predisposition to familial WM.
LAPTM5c403t and HCLS1g496a represent possible candidates for screening in familial WM.
Abstract
Familial aggregation of Waldenström macroglobulinemia (WM) cases, and the clustering of B-cell lymphoproliferative disorders among first-degree relatives of WM patients, has been reported. Nevertheless, the possible contribution of inherited susceptibility to familial WM remains unrevealed. We performed whole exome sequencing on germ line DNA obtained from 4 family members in which coinheritance for WM was documented in 3 of them, and screened additional independent 246 cases by using gene-specific mutation sequencing. Among the shared germ line variants, LAPTM5c403t and HCLS1g496a were the most recurrent, being present in 3/3 affected members of the index family, detected in 8% of the unrelated familial cases, and present in 0.5% of the nonfamilial cases and in <0.05 of a control population. LAPTM5 and HCLS1 appeared as relevant WM candidate genes that characterized familial WM individuals and were also functionally relevant to the tumor clone. These findings highlight potentially novel contributors for the genetic predisposition to familial WM and indicate that LAPTM5c403t and HCLS1g496a may represent predisposition alleles in patients with familial WM.
Introduction
The evaluation of cancer occurrence within families is important for unraveling the molecular events that drive tumorigenesis. Waldenström macroglobulinemia (WM) represents a B-cell lymphoproliferative disorder, classified as a lymphoplasmacytic lymphoma, according to the World Health Organization classification.1 WM represents a rare B-cell malignancy that accounts for 1% to 2% of all hematologic neoplasms, with an incidence rate of 3 to 4 cases per million people, per year.2,3 Evidence also exists that immunoglobulin M-monoclonal gammopathy of undetermined significance (IgM-MGUS) is associated with an increased risk of developing WM.4,5 Most recently, whole genome sequencing studies have demonstrated the occurrence of MYD88 and CXCR4/WHIM-like somatic variants in more than 90% and 30% to 35% of WM patients, respectively.6-9
Previous studies have identified familial aggregation of WM cases and the clustering of B-cell lymphoproliferative disorders among first-degree relatives of patients with WM,10-16 including chronic lymphocytic leukemia (CLL), non-Hodgkin lymphoma (nHL), multiple myeloma (MM), IgM-MGUS, and IgG/IgA-MGUS. These findings support a possible contribution of inherited susceptibility to familial WM. Nevertheless, genetic linkage studies have failed to clearly identify rare, highly penetrant alleles underlying a subset of B-cell lymphoproliferative disorders, yielding to an oligogenic model whereby relatively common, low-penetrance alleles would contribute to the LP phenotype in familial cases.17,18 In fact, genome-wide association studies have identified common variants associated with MM.19-22 Case control studies have similarly identified significant association with other lymphoproliferative disorders.20,23 Whether relatively common germ line variants may contribute specifically to familial WM cases remains unexplored.
We therefore performed whole exome sequencing on germ line DNA obtained from 4 members of a single family with documented coinheritance of WM (3 affected; 1 unaffected) and applied bioinformatics tools to identify candidate germ line variants likely to have a biological role in WM signaling pathways. We screened an additional 246 independent, unrelated WM cases (50 probands from familial cases and 196 individuals from nonfamilial cases) for the identified variants. We report that LAPTM5c403t and HCLS1g496a represent the most recurrent variants, present in 3/3 affected members of the index family. Each of these variants was present in 8% of the unrelated familial cases. Each variant was present in 0.5% of the nonfamilial cases and in <0.05 of a control population (1000 Genomes; www.1000genomes.org). These findings highlight potentially novel contributors for the genetic predisposition to familial WM and suggest possible candidates for screening in familial WM.
Methods
Study oversight
Approval for these studies was obtained from the Dana-Farber Cancer Institute Institutional Review Board. Informed consent was obtained from all patients in accordance with the Declaration of Helsinki. Written consent was obtained from all study participants.
Study participants
We studied a family with 3 first-degree relatives affected with WM, and a healthy unaffected family member who was considered to be a control, and performed whole exome sequencing. An additional 246 participants representing individual probands affected with WM from 246 independent families were included in these studies: 50 of the participants had a family history of WM or other B-cell lymphoproliferative disorders, whereas the remaining 196 were nonfamilial patients. Based on previous identification of familial aggregation of WM cases and clustering of B-cell lymphoproliferative disorders (including CLL, MM, and nHL) among the first-degree relatives of WM patients,11-16 we defined a familial case as a patient with first-degree relatives who are also affected by a B-cell lymphoproliferative disorder, including WM, CLL, nHL, MM, IgM-MGUS, and IgG/IgA-MGUS; nonfamilial cases are individuals whose family members are either healthy or are diagnosed with a solid tumor, but not a B-cell lymphoproliferative malignancy.
Sequence data generation
Genomic DNA was isolated from buccal cells collected from each study participant, including the 246 independent WM cases, and was subjected to library construction, according to standard methods, followed by shearing, end repair, phosphorylation, and ligation to barcoded sequencing adapters. The DNA was size-selected to exonic hybrid capture using SureSelect v2 Exome bait (Agilent, Santa Clara, CA). Samples were multiplexed and sequenced on Illumina HiSeq flow cells with the goal of an average depth of coverage of 100× (Center for Cancer Computational Biology, Dana-Farber Cancer Institute, Boston, MA). Reads were aligned to GRCh37 using Burrows-Wheeler aligner24 with default parameters, and the resulting SAM files were converted to BAM files using Picard (http://picard.sourceforge.net/).
Quality control of sequencing data
To evaluate the overall quality of sequenced samples, we used BamUtil (http://genome.sph.umich.edu/wiki/BamUtil) to calculate various statistics, including the total number of reads, mapping rate, percentage of proper pairs, and duplication rate. Given that the SureSelectQXT v4 platform covers ∼51 M, a mean coverage was calculated for each sample. We evaluated the distribution of the mean coverage across all targeted regions. The DepthOfCoverage function from GATK (v2.74) was used with the “-mmq 10” parameter.25 All unmapped reads, duplicated reads, and reads with low mapping quality (<10) were removed. Finally, a more comprehensive callable analysis on all targeted bases was adopted by considering simultaneously sequencing quality, mapping quality, and coverage, according to the CallableLoci tool (GATK, Broad Institute, Cambridge, MA).
Variant calling
Single nucleotide variants (SNVs) were called by GATK,26 based on best practice workflow. Briefly, GATK was used for base quality score recalibration and indel realignment, followed by variant calling with use of UnifiedGenotyper with –stand_call_conf = 30.
Selection of potential WM variants
We first selected variants that occurred in affected members but were absent in the unaffected member. The resulting SNVs were annotated by snpEff27 and Oncotator (http://www.broadinstitute.org/oncotator/); variants located in exonic regions were considered for further analysis. Synonymous variants were filtered out, and the resulting SNVs were annotated with dbNSFP, a database that was developed for functional prediction and annotation of all potential nonsynonymous SNVs in the human genome.25 Allele frequencies in 1000 Genomes, as well as Polymorphism Phenotyping v2 (PolyPhen-2) prediction, were used to quantify the deleteriousness of SNVs with allele frequencies >0.05 in 1000 Genomes were filtered out, and the remaining SNVs were defined as potential familial WM-associated variants.
Differential expression analysis
Differential expression was analyzed using the Bioconductor (www.bioconductor.org) package limma, in the R statistical computing environment (www.r-project.org)28 ; using empirical Bayes-moderated t statistics to calculate P values for 2-class unpaired samples. Differentially expressed genes were identified using a false discovery rate (FDR) cutoff of 1%. Among the differentially expressed genes, those with more than a 2-fold change were defined as a signature.
Gene expression data
GSE12668 was used to define genes that were differentially expressed between bone marrow–derived primary WM cells and their normal counterparts. Tissue specificity for genes of interest in normal tissues was retrieved directly from Gene Enrichment Profiler (GEP; http://xavierlab2.mgh.harvard.edu/EnrichmentProfiler/index.html). Expression levels for genes of interest in primary cancer tissues were downloaded from the cbioportal (http://cbioportal.org),29 which integrates data generated by The Cancer Genome Atlas (TCGA) Research Network (http://cancergenome.nih.gov/) and by many other sources. Gene expression modules were assessed using an independent messenger RNA (mRNA) data set (GSE6691).
Disease gene prioritization
Given the selected potential familial WM-associated variants/genes, we used Gene Relationships Across Implicated Loci (GRAIL)30 coupled with global coexpressed gene database (COXPRESdb)31 to assess the functional relatedness between these genes and those that were differentially expressed between bone marrow–derived primary WM cells compared with their normal counterparts, using GSE12668.
Sanger sequencing
The observed LAPTM5 and HCLS1 variants were validated by Sanger sequencing. The Fisher exact test was used to assess the significance of variants observed in familial cases, compared with those in nonfamilial cases.
Three-dimensional protein modeling
Three-dimensional LAPTM5 protein reconstruction was obtained using the Phyre2 server, as described previously.32
Statistics
All analyses of raw sequencing metrics were performed using the limma package in an R/Bioconductor computational environment; differentially expressed genes were identified using an FDR cutoff of 1%. Fisher exact test was used to assess the significance of variants observed in familial cases compared with those in nonfamilial cases.
Results
Identification of LAPTM5c403t and HCLS1g496a variants in familial WM
We studied a kindred in which 3 members are affected by familial WM (Figure 1A). The diagnosis of WM was confirmed in all cases by histology and immunohistochemistry33 (clinical features are summarized in supplemental Table 1, available on the Blood Web site). Whole exome sequencing was performed on the 3 affected members and 1 unaffected member. The criteria to identify candidate WM-associated variants were nonsynonymous SNVs that were present only in the affected family members and absent in the unaffected member and an allele frequency <0.05 using 1000 Genomes. This initial screen identified 132 candidate exonic, nonsynonymous familial WM variants, mapping to 127 genes (Figure 1B; supplemental Table 1; supplemental Figure 1). We next performed a gene/variant prioritization to select the most significant WM-relevant variants by using GRAIL,30 which integrates text mining or coexpression databases for gene prioritization. GRAIL takes a group of seed genes to build a subnetwork and test whether a query gene is functionally related to the seeds. Because the seeds should be disease-related genes, we used a WM gene expression signature as seeds and then queried the potential WM variants identified in this study for prioritization. The publically available gene expression data set (GSE12668)34 was evaluated to define a WM cell-mRNA signature by comparing primary bone marrow–derived CD19+ WM tumor cells obtained from newly diagnosed untreated WM patients with their normal counterparts obtained from healthy individuals (HDs); we found 393 genes that were significantly different between WM patients and HDs, thus confirming the presence of a specific WM mRNA signature (FDR cutoff <1%), (Figure 2A; supplemental Table 3). We next assessed the functional relatedness between potential familial WM-associated genes and the observed WM mRNA signature using the GRAIL algorithm30 coupled with a global coexpression network COXPRESdb, as reported.31 Using an FDR cutoff of 5%, 13 genes were predicted to be functionally related to the WM mRNA signature (Figure 2B). In addition, the observed germ line variants were annotated and interpreted by implementing PolyPhen-2, as described,35 thus allowing us to prioritize the deleteriousness of SNVs shared among all 3 affected WM family members.
It has been reported that disease-related genes tend to be tissue-specific36 ; we therefore assessed the tissue specificity of the selected genes for further prioritization. The GEP database contains expression profiles and tissue specificity scores for ∼12 000 genes across 126 normal primary human tissues and 23 cancers.37 WM is a B-cell lymphoproliferative disorder and is classified as an IgM-secreting lymphoplasmacytic lymphoma1 ; we thus used the GEP database to assign a score to each of the selected genes, which was the average of its tissue specificity score across normal B cells. We reasoned that genes containing deleterious variants that are significantly related to the WM mRNA expression signature, and are also highly specific to B cells, may be the most promising familial WM-associated genes. Among the 13 potential candidate genes obtained as the result of the network-based gene prioritization algorithm independently of any B-cell tissue specificity filtering criteria, the most significant gene candidates were found to be HCLS1 and LAPTM5 (Figure 2C; supplemental Table 4). Variants in these 2 genes cosegregated within the affected family members, being absent in the unaffected family member and present in less than 1% of the control population, where allele frequency and heterozygote frequency were 0.38%/0.7% and 0.39%/0.7% for LAPTM5 and HCLS1, respectively, using 1000 Genomes (supplemental Table 5). Both genes are highly B-cell tissue-specific (supplemental Figure 2A-C). In contrast, the 2 control genes HRC and IL22RA1 (which are not functionally related to WM expression signature, but contain benign variants) are not B cell–tissue-specific (supplemental Figure 3A-B). TCGA revealed that the expression of LAPTM5 and HCLS1 is specifically enriched in patients with lymphoid malignancies (diffuse large B-cell lymphoma) as compared with solid tumors (Figure 2D). These findings were further corroborated by analyzing 1037 tumor cell lines deposited in the Cancer Cell Line Encyclopedia, which showed a significant enrichment of both LAPTM5 and HCLS1 in hematopoietic-related tumors (supplemental Figure 4A-B). Taken together, these observations suggest that LAPTM5 and HCLS1 are both relevant candidate genes for characterizing familial WM individuals and are also relevant to the tumor clone.
Sanger sequencing confirmed the LAPTM5c403t variant in the original family, being present in 3/3 patients with familial WM and absent in the unaffected family member. We next performed Sanger sequencing in 246 independent, unrelated WM cases (50 probands were from familial cases, 196 of which were nonfamilial cases). Based on previously identified familial aggregation of WM cases and clustering of B-cell lymphoproliferative disorders among first-degree relatives of patients with WM,11-16 we defined a familial case as a patient with first-degree relatives who were affected by a B-cell lymphoproliferative disorder, including WM, CLL, nHL, MM, IgM-MGUS, and IgG/IgA-MGUS; nonfamilial cases were individuals whose family members were either healthy or diagnosed with a solid tumor, but not a B-cell lymphoproliferative malignancy.
The LAPTM5c403t variant was present in 4 of 50 (8%) of the familial WM cases, but was detected in only 1 of 196 (0.5%) of the nonfamilial cases, thus demonstrating a statistically significant difference in the presence of the LAPTM5c403t variant in familial vs nonfamilial cases (supplemental Table 6; P = .007). Notably, the 4 familial members that presented with the LAPTM5c403t germ line aberration had a family history of either WM (n = 2), MM (n = 1), or nHL (n = 1) (Figure 3A). The variant is located in exon 5 of the LAPTM5 gene (Figure 3B). LAPTM5 is a 29-kDA protein consisting of 5 hydrophobic transmembrane helical domains and is preferentially expressed in cells of lymphoid and myeloid origin.38 The variant is predicted as missense mutation that causes an amino-acid substitution from proline to serine at the fourth transmembrane helical domain (LAPTM5P135S; Figure 3C).
Sanger sequencing also confirmed the HCLS1g496a variant to be detectable in the original family, being present in 3/3 patients with familial WM and absent in the unaffected family member. The presence of the HCLS1g496a variant was confirmed in 4 of 50 (8%) of independent familial WM cases (4/50) and in 1 of 196 (0.5%) of the nonfamilial cases, documenting a statistically significant difference in the numbers of patients carrying the HCLS1g496a variant in familial vs nonfamilial cases (supplemental Table 6; P .007). Notably, the 4 patients with familial WM that presented with the HCLS1g496a germ line aberration had a family history of CLL (n = 2), WM (n = 1), or nHL (n = 1) (Figure 4A). The variant is located within exon 7 of the HCLS1 gene (Figure 4B). HCLS1 is a 79-KDa intracellular protein that consists of an Arp2/3 complex binding domain, 3.5 tandem repeats, and a coiled-coil region that binds to F-actin and a carboxyl-terminal SH3 domain.39 The variant is predicted as a missense mutation that causes an amino-acid substitution from aspartic acid to asparagine at the third HS1 repeat (HCLS1D166N; Figure 4C).
LAPTM5 and HCLS1 impact on WM disease biology
To interrogate a possible involvement of LAPTM5 and HCLS1 in WM biology, we used the Human Experimental/Functional Mapper (HEFalMp) database to construct 2 modules for LAPTM5 and HCLS1. The HEFalMp database provides a global gene–gene association map, predicted by integrating hundreds of publicly available genomic datasets,40 and has led to the identification of the top 25 genes associated with LAPTM5 or HCLS1 (Figure 5A). Of note, HCLS1 was part of the LAPTM5 module, and vice versa, suggesting that both genes function through the same biological module. In addition, 52% of the identified genes were shared between the 2 modules.
Besides LAPTM5 and HCLS1, the 2 other known WM-related genes, MYD88 and TNFAIP3, were also present in this subnetwork: the MYD88 L265P somatic mutation is present in 91% of patients with WM,6 and TNFAIP3, occurring in the context of the 6q deletion, is the most frequent cytogenetic event described in WM.41 Of note, although MYD88 belonged to the HCLS1 module, TNFAIP3 appeared within the LAPTM5 module.
We next interrogated the most significant pathways that were connected to each subnetwork by using a gene-pathway connectivity map that was generated by integrating a 70K microarray, Functional Annotation of the Mammalian Genome 5, and protein–protein interaction data, as previously reported.42 Statistically significant pathways were identified testing 184 Kyoto Encyclopedia of Genes and Genomes pathways and using a permutation test, with an adjusted P value of .05. The LAPTM5 and HCLS1 subnetworks shared similar enriched pathways, including immune-related pathways, B-cell receptor-, Janus kinase/signal transducer and activator of transcription-, vascular endothelial growth factor receptor-, and chemokine-signaling pathways as well as cytokine–cytokine receptor interaction (supplemental Table 7).
We combined the LAPTM5 and HCLS1 modules into a single WM-associated module and investigated whether this module is disrupted in WM patients compared with HDs43 using an independent gene expression profile dataset (GSE6691),44 and demonstrated a statistically significant high gene-to-gene connectivity in HDs (Figure 5B-C). In contrast, gene-to-gene connectivity was significantly inferior in WM cells (Figure 5D), wherein the subnetwork was disrupted (Figure 5E). Using 10 000 random modules with the same size as background, we found that the change in mean connectivity was significantly different in the WM vs the HD modules (P < .0001, Figure 5F). Together, these findings suggest a conserved high interactivity between LAPTM5 and HCLS1 in normal B cells, whereas WM cells present a disrupted pattern of connectivity that likely impacts disease biology.
Discussion
Recent studies on the genomic landscape of WM have described recurrent somatic aberrations, including MYD88L265P and CXCR4S338X,6-8,45 whereas the germ line determinants of familial WM cases remain unexplored. We have shown that the LAPTM5c403t variant may predispose to familial WM, and that in transformed WM cells there is a disrupted pattern of connectivity between LAPTM5 and MYD88, leading to the hypothesis that these 2 genes may interplay in supporting the pathogenesis of this disease.
In summary, the identification of a germ line variant in genes that display oncogenic properties in B-cell lymphoproliferative disorders offers new insights into the molecular mechanisms of lymphoplasmacytic lymphoma pathogenesis, particularly in WM. The LAPTM5 and HCLS1 genes show relevant tissue-specific expression for WM and predicted functional relationship to WM phenotype by expression signature. The specific and recurrent LAPTM5c403t and HCLS1g496a variants observed in this study demonstrated segregation with disease in a striking familial WM pedigree and enrichment in other familial cases.
According to allele and genotype frequencies from the 1000 Genomes database, ∼1 of 132 people (0.7%) in the general population carries either the HCLS1g496a or the LAPTM5c403t variant. It is therefore expected that roughly 1 in 17 000 (0.0058%) individuals in the general population carries both of these independently assorting variants. This latter number aligns with population estimates of WM prevalence. WM represents a rare phenotype that could plausibly require more predisposing genetic factors on average than associated and much more common B-cell lymphoproliferative disorders. This would be consistent with our finding of both the LAPTM5c403t and HCLS1g496a variants in the 3 individuals affected with WM in our primary study family for this report. Familial cases with less striking family history were enriched for 1 variant or the other, but none had both. One can envision a model by which 2 risk variants predispose to WM by a combinatorial mechanism. A polygenic mode also remains possible.
Previous studies have reported on LAPTM5 overexpression in patients with B-cell lymphomas.46 Our findings indicate LAPTM5c403t as the most significantly predicted variant to be functionally related to the WM mRNA signature. Of note, LAPTM5 has been shown to support nuclear factor κB (NF-κB) activation upon tumor necrosis factor-α stimulation,47 and primary WM cells reportedly present with a constitutive activation of the NF-κB pathway48 ; we may therefore hypothesize that mutated LAPTM5 may possibly contribute to NF-κB modulation in WM cells. Further studies would be needed to better characterize the relevance of LAPTM5c403t in regulating canonical and noncanonical NF-κB activity in WM cells at protein level and the potential relevance of the variant in regulating WM cell proliferation.
LAPTM5c403t and HCLS1g496a may represent predisposition alleles in patients with familial WM. Future studies will be needed to clarify the penetrance of specific alleles as well as possible combinatorial effects. Our findings suggest that the contribution of the LAPTM5c403t and HCLS1g496a variants to WM susceptibility should be further investigated.
The online version of this article contains a data supplement.
The data reported in this article have been deposited in the Gene Expression Omnibus database (accession number: SRP053196).
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
We thank the International Waldenstrom’s Macroglobulinemia Foundation, the Michelle and Steven Kirsch Laboratory for Waldenström’s, and The Heje fellowship for Waldenstrom’s. We also thank Dr Sonal Jhaveri for editing the manuscript.
Authorship
Contribution: A.M.R., A.S., J.S., and I.M.G. conceived and designed the experiments and analyzed the data; A.M.R. wrote the manuscript; J.S. and W.H. performed bioinformatics analysis; A.P.-G., A.S., M. Chiarini, Y.A., Y.M., and M.M. performed the experiments; M. Correll, S.M., S.G., E.M.V.A., and Y.K. analyzed the data; and J.J.C., S.P.T., L.I., G.R., J.R.B., M.R.I., M.L.F., I.R., E.H., and I.M.G. revised the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Irene M. Ghobrial, Dana-Farber Cancer Institute, Department of Medical Oncology, 450 Brookline Ave, Boston, MA 02115; e-mail: irene_ghobrial@dfci.harvard.edu; and Aldo M. Roccaro, Dana-Farber Cancer Institute, Department of Medical Oncology, 450 Brookline Ave, Boston, MA 02115; e-mail: aldo_roccaro@dfci.harvard.edu.
References
Author notes
A.M.R., A.S., and J.S. contributed equally to this work.