Abstract
We performed a meta-analysis of 3 genome-wide association studies to identify additional common variants influencing chronic lymphocytic leukemia (CLL) risk. The discovery phase was composed of genome-wide association study data from 1121 cases and 3745 controls. Replication analysis was performed in 861 cases and 2033 controls. We identified a novel CLL risk locus at 6p21.33 (rs210142; intronic to the BAK1 gene, BCL2 antagonist killer 1; P = 9.47 × 10−16). A strong relationship between risk genotype and reduced BAK1 expression was shown in lymphoblastoid cell lines. This finding provides additional support for polygenic inheritance to CLL and provides further insight into the biologic basis of disease development.
Introduction
Chronic lymphocytic leukemia (CLL) is the most common form of lymphoid malignancy in Western countries.1 Although CLL shows a strong familial risk,2 the genetic basis of inherited predisposition to CLL is largely unknown. Recent genome-wide association studies (GWAS) of CLL have provided evidence that the coinheritance of multiple low-risk variants located on chromosomes 2q37.1, 2q37.3, 2q13, 6p21.3, 6p25.3, 8q24.21, 11q24.1, 15q21.3, 15q23, 15q25.2, 16q24.1, and 19q13.32 contributes to the heritability of CLL.3–6
The statistical power of individual GWAS has been limited by the modest effect sizes of individual genetic variants, the need to establish stringent statistical significance thresholds, and financial constraints on the number of variants that can be followed up. Meta-analysis of existing GWAS data therefore offers the opportunity to discover additional CLL susceptibility loci.
In this study, we conducted a meta-analysis of GWAS data, followed by validation in an independent case-control series, enabling us to identify a novel susceptibility locus for CLL at 6p21.33.
Methods
Participants
All data collection from study participants were approved by the respective institutional review boards, and all participants provided written informed consent in accordance with the Declaration of Helsinki. For all cases, the diagnosis of CLL had been pathologically confirmed in accordance with the World Health Organization guidelines.7
Discovery datasets
The discovery phase was composed of 3 previously described GWAS conducted in the United Kingdom (UK-GWAS)4,5 with 503 cases and 2699 controls, in the San Francisco Bay Area (SF-GWAS)8 with 211 cases and 750 controls, and in the Genetic Epidemiology of CLL (GEC) consortium (GEC-GWAS)3 with 407 cases and 398 controls (supplemental Methods, available on the Blood Web site; see the Supplemental Materials link at the top of the online article).
Replication series
The replication series was composed of 861 CLL cases (565 men; mean age at diagnosis, 61.9 years), ascertained through United Kingdom hematology clinics. Controls were 2033 healthy persons recruited to the National Cancer Research Network genetic epidemiologic studies, the National Study of Colorectal Cancer,9 the Genetic Lung Cancer Predisposition Study,10 and the Royal Marsden Hospital Trust/Institute of Cancer Research Family History and DNA Registry. These controls were the spouses or unrelated friends of persons with malignancies. Both cases and controls were British residents and of European ancestry. Genotyping was conducted using competitive allele-specific PCR KASPar chemistry (KBiosciences). To confirm genotyping accuracy, duplicate samples were genotyped, along with direct sequencing of subsets of samples. For all single nucleotide polymorphisms (SNPs), more than 99% concordant results were obtained.
Statistical analysis
Main analyses used R Version 2.5, Stata Version 11, and PLINK Version 1.07 software.11 The association between each SNP and CLL risk was assessed by the Cochran-Armitage trend test. Odds ratios (ORs) and 95% confidence intervals were calculated by unconditional logistic regression. Meta-analysis was conducted under a fixed-effects model. Cochran Q statistic (to test for heterogeneity) and the I2 statistic (to quantify the proportion of the total variation because of heterogeneity) were calculated.12 Associations by sex, age, and clinic-pathologic phenotypes were examined by logistic regression in case-only analyses. The familial risk attributable to SNPs was calculated as previously described.13 Linkage disequilibrium metrics were based on Data Release 27/phase 3 (February 2009) on NCBI B36 assembly.
Results and discussion
The combined GWASs provided genotype data on 1121 cases and 3745 controls; imputation based on data from the HapMap project allowed association testing for more than 1 500 000 SNPs. We verified the previously identified risk loci (all P < .05) in the meta-analysis (supplemental Table 1) and further identified 15 SNPs that mapped to 6 novel loci.
To validate these findings, we genotyped the top 10 informative SNPs in an additional series of 861 CLL cases and 2033 controls (Table 1; supplemental Table 2).
SNP . | Position, bp . | Risk allele . | UK-GWAS . | SF-GWAS . | GEC-GWAS . | Replication . | Combined . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RAF . | OR (95% CI) . | Ptrend . | RAF . | OR (95% CI) . | Ptrend . | RAF . | OR (95% CI) . | Ptrend . | RAF . | OR (95% CI) . | Ptrend . | OR (95% CI) . | Ptrend . | Phet (I2) . | |||
rs210134 | 33 648 187 | G | 0.68 | 1.31 (1.13-1.51) | 2.95 × 10−4 | 0.69 | 1.56 (1.21-2.02) | 6.39 × 10−4 | 0.70* | 1.37 (1.08-1.75)* | .0096* | 0.68 | 1.35 (1.19-1.54) | 4.87 × 10−6 | 1.37 (1.22-1.53) | 1.03 × 10−12 | .50 (0%) |
rs210142 | 33 654 815 | C | 0.70* | 1.35 (1.17-1.57)* | 6.82 × 10−5* | 0.70* | 1.58 (1.22-2.05)* | 5.44 × 10−4* | 0.70* | 1.38 (1.09-1.76)* | .0077* | 0.69 | 1.47 (1.28-1.68) | 2.41 × 10−8 | 1.40 (1.25-1.57) | 9.47 × 10−16 | .59 (0%) |
SNP . | Position, bp . | Risk allele . | UK-GWAS . | SF-GWAS . | GEC-GWAS . | Replication . | Combined . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
RAF . | OR (95% CI) . | Ptrend . | RAF . | OR (95% CI) . | Ptrend . | RAF . | OR (95% CI) . | Ptrend . | RAF . | OR (95% CI) . | Ptrend . | OR (95% CI) . | Ptrend . | Phet (I2) . | |||
rs210134 | 33 648 187 | G | 0.68 | 1.31 (1.13-1.51) | 2.95 × 10−4 | 0.69 | 1.56 (1.21-2.02) | 6.39 × 10−4 | 0.70* | 1.37 (1.08-1.75)* | .0096* | 0.68 | 1.35 (1.19-1.54) | 4.87 × 10−6 | 1.37 (1.22-1.53) | 1.03 × 10−12 | .50 (0%) |
rs210142 | 33 654 815 | C | 0.70* | 1.35 (1.17-1.57)* | 6.82 × 10−5* | 0.70* | 1.58 (1.22-2.05)* | 5.44 × 10−4* | 0.70* | 1.38 (1.09-1.76)* | .0077* | 0.69 | 1.47 (1.28-1.68) | 2.41 × 10−8 | 1.40 (1.25-1.57) | 9.47 × 10−16 | .59 (0%) |
RAF indicates risk allele frequencies (based on control data).
Imputed genotype.
The 2 SNPs localizing to 6p21.33, rs210134 and rs210142, showed convincing evidence of an association in the replication series (OR = 1.35, P = 4.87 × 10−6 and OR = 1.47, P = 2.41 × 10−8, respectively; Table 1). In the combined analysis, both rs210142 (OR = 1.37, P = 9.47 × 10−16) and rs210134 (OR = 1.40, P = 1.03 × 10−12) provided evidence for an association with CLL at genome-wide significance (ie, P < 5.0 × 10−8; Table 1). None of the other SNPs replicated.
rs210142 maps to chromosome 6p21.33, within intron 1 of the BAK1 gene (BCL2 antagonist killer 1; MIM: 600516), and rs210134 maps 100 kb telomeric to BAK1 (supplemental Figure 1). Both SNPs map to the same region of linkage disequilibrium and are highly correlated (r2 = 0.8, D′ = 0.9). BAK1 promotes apoptosis by binding to and antagonizing the apoptosis repressor activity of BCL2 and other antiapoptotic proteins.14,15 Somatic rearrangements of the immunoglobulin heavy chain locus of BCL2 that result in constitutive BCL2 overexpression are found in both CLL and follicular lymphomas. The expression of BAK1 is essential for the maintenance of B-cell homeostasis; in mice that are conditionally deficient in BAK1, there is an accumulation of immature and mature follicular B cells with defective cell cycling in response to B-cell receptor stimulation.16
To explore whether the 6p21.33 association reflects cis-acting regulatory effects on BAK1, we analyzed publicly available mRNA expression (Figure 1). A strong relationship between rs210134 risk genotype and reduced BAK1 expression was shown in both datasets (P = 7.8 × 10−5 and .0389, combined P = 4.4 × 10−5; Figure 1). This result suggests a biologically plausible mechanism by which reduction of BAK1 expression alleviates repression of antiapoptotic proteins, thereby inhibiting apoptosis and hence contributing to B-cell neoplasia. BAK1 does not, however, directly interact with BCL2, but its interaction is dependent on the allelic form of BCL2 present in cells; thus, the effects of BAK1 are context dependent.18
CLL shows male predominance and can be classified on the basis of the presence or absence of somatic hypermutations of IGVH genes, with mutated CLL having a better prognosis.19 We assessed the relationship between age, sex, IGVH mutation, and rs210142 and rs210134 genotypes by case-only logistic regression (supplemental Methods). rs210142 and rs210134 showed no evidence of a relationship with age, sex, or IGVH mutation status (supplemental Table 3). Furthermore, using data from the CLL4 trial patients (supplemental Methods), we found no evidence that rs210134 genotype influences overall survival or progression-free survival (hazard ratio = 1.11, P = .41 and hazard ratio = 1.12, P = .23; respectively).
We investigated the combined effect of the 6p21.33 variation and the previously identified risk variants on CLL risk. No evidence of interaction between any of the loci (P > .05) was observed, compatible with each locus having an independent role in defining risk. Whereas the risks of CLL associated with the 6p21.33 and other variants are individually modest, the carrier frequencies of the risk alleles are high in those persons of European ancestry; therefore, the loci contribute significantly to the development of CLL.
Collectively, the currently identified susceptibility loci account for approximately 16% of the familial CLL risk. Previous genetic linkage studies have failed to provide evidence that rare, high-penetrance genes contribute significantly to the familial risk. Our study had moderate power (∼ 50%) to detect variants, such as 6p21.33, indicating that additional common variants with similar or smaller effects might be identified with additional GWAS data.
By pooling GWAS data and conducting replication analyses, we have identified a novel CLL susceptibility locus. Although additional analyses are required to determine the functional consequences of 6p21.33 variation, the findings further highlight the importance of genetic variation in B-cell developmental pathway as a biologic basis to CLL pathogenesis. The frequency of the rs210142 risk allele is substantially higher in Europeans than in other ancestry groups consistent with adaptive selection. Hence, it will also be intriguing to explore how our findings translate to non-European populations, some of which are typified by a significantly lower prevalence of CLL.20
The online version of this article contains a data supplement.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
The authors thank the study participants and the study coordinators for work in recruitment. This study made use of genotyping data from the 1958 Birth Cohort and NBS samples, kindly made available by the Wellcome Trust Case Control Consortium 2. A full list of the investigators who contributed to the generation of the data is available at http://www.wtccc.org.uk/. NHS funding for the Royal Marsden Biomedical Research Center is acknowledged.
URLs are as follows: the R suite, http://www.r-project.org; Illumina, http://www.illumina.com; dbSNP, http://www.ncbi.nlm.nih.gov/projects/SNP; HapMap, http://www.hapmap.org; 1000Genomes, http://www.1000genomes.org; SNAP, http://www.broadinstitute.org/mpg/snap; IMPUTE, https://mathgen.stats.ox.ac.uk/impute/impute.html; SNPTEST, http://www.stats.ox.ac.uk/∼marchini/software/gwas/snptest.html; MACH1, http://www.sph.umich.edu/csg/abecasis/MACH; Wellcome Trust Case Control Consortium, www.wtccc.org.uk; Mendelian Inheritance In Man, http://www.ncbi.nlm.nih.gov/omim; Cancer Genome Atlas project, http://cancergenome.nih.gov; and Genevar (GENe Expression VARiation), http://www.sanger.ac.uk/resources.
In the GEC Consortium, the work was supported in part by the National Institutes of Health (NIH; grants CA118444 and CA148690, S.L.S.; and grant CA92153, J.R.C.); the Intramural Research Program of the NIH, National Cancer Institute (NCI); and the Veterans Affairs Research Service. Data collection in Utah was made possible by the Utah Population Database and the Utah Cancer Registry. Partial support for all data in the Utah Population Database was provided by the University of Utah Huntsman Cancer Institute. The Utah Cancer Registry is funded by the National Cancer Institute Surveillance Epidemiology and End Results program (contract HHSN261201000026C) with additional support from the Utah State Department of Health and the University of Utah. Sample collection at Duke University was supported by a Leukemia & Lymphoma Society Career Development Award (M.C.L.), the Bernstein Family Fund for Leukemia and Lymphoma Research, and the NIH (1K08CA134919, M.C.L.). The SF-GWAS was supported by the NIH (grants CA122663, CA154643-01A1, and CA104682, to C.F.S.) and the NCI NIH (grants CA45614 and CA89745, to P.M.B.). E.H. is a faculty fellow of the Edmond J. Safra Bioinformatics program at Tel-Aviv University. In the United Kingdom, Leukemia Lymphoma Research Fund provided principal funding for the study (LRF05001 and 06002). Additional funding was provided by Cancer Research UK (C1298/A8362 supported by the Bobby Moore Fund), and the Arbib Fund. M.C.D.B. was supported by the NIH (CA148690).
National Institutes of Health
Authorship
Contribution: S.L.S., C.F.S., and R.S.H. designed the study, obtained financial support, and drafted the manuscript; M.C.D.B., L.C., E.H., R.W., D.J.S., and S.K.M. conducted statistical and bioinformatic analyses; P.B. performed laboratory management and oversaw genotyping of United Kingdom cases and controls; N.C., A.H., and S.H. performed genotyping of United Kingdom replication samples; R.S.H. and D.C. developed protocols for recruitment of persons with CLL and sample acquisition within the United Kingdom; P.M.B. developed protocols for recruitment of persons with CLL and sample acquisition within the San Francisco study; M.J.S.D., C.D., and E.M. performed ascertainment and collection of United Kingdom CLL cases; L.R.G. and N.E.C. oversaw recruitment of persons from the NCI; J.B.W. and M.C.L. oversaw recruitment of persons from Duke University; S.L.S., J.R.C., and C.M.V. oversaw recruitment of persons from Mayo Clinic; S.S.S. oversaw recruitment of persons from the MD Anderson Cancer Center; N.J.C. oversaw recruitment of persons from Utah; L.G.S. and V.A.M. oversaw recruitment of persons from University of Minnesota/Minneapolis Veterans Administration Medical Center; and all authors contributed to the final manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Susan L. Slager, Mayo Clinic College of Medicine, 200 1st St SW Rochester, MN 55905; e-mail: slager@mayo.edu; Christine F. Skibola, Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, B84 Hildebrand Hall MC 7356, Berkeley, CA; e-mail: chrisfs@berkeley.edu; and Richard S. Houlston, Molecular and Population Genetics, Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, SM2 5NG, United Kingdom; e-mail: richard.houlston@icr.ac.uk.
References
Author notes
S.L.S. and C.F.S. contributed equally to this study as co–first authors.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal