To the editor:

Langerhans cell histiocytosis (LCH) is a hematologic disorder that presents with a wide spectrum of symptoms, ranging from focal lesions to potentially lethal multiorgan disease, affecting 4 to 8 per million children per year1  and 1 to 2 per million adults per year.2  Mutually exclusive activating somatic mutations have been identified in ∼85% of cases, and extracellular signal-regulated kinase activation is universal in LCH dendritic cells (DCs; as reviewed in Rollins3 ). BRAF-V600E (50–65%) is the most common mutation, with alternative genomic alterations in BRAF, MAP2K1, ARAF, and other MAPK pathway genes.3,4  A model is emerging in which somatic genomic alterations that activate MAPK signaling at critical stages of myeloid differentiation drive LCH pathogenesis and define disease severity.5,6 

Random acquisition of cell-specific somatic mutations may not completely explain LCH pathogenesis. Data from the US Surveillance, Epidemiology, and End Results program indicate that incidence of high-risk LCH is lower among African Americans compared with Caucasians and higher among Hispanics compared with non-Hispanics.7  This is similar to other pediatric cancers where there are documented differences in incidence by race/ethnicity,8  genome-wide association studies (GWAS) have successfully identified novel susceptibility loci,9  and some of these risk alleles vary by genetic ancestry.10  Racial/ethnic differences in BRAF-V600E exist in other cancers characterized by MAPK activation, including colorectal cancer.11  We therefore tested the role of inherited genetic variation on LCH risk through a GWAS using a case-parent trio design (for a study overview, see supplemental Figure 1, available on the Blood Web site).

LCH families (case-parent trios and duos) were recruited from an institutional cohort during the period of 2010 to 2015. Case-parent trios and duos were eligible for inclusion at the time of recruitment. Informed consent was obtained from participants under a protocol approved by the Baylor College of Medicine Institutional Review Board. Inheritance under Mendelian expectation was assessed using the case-parent trio design for genetic association testing.12,13  This study design was implemented because genetic association tests under this methodology are not vulnerable to population stratification bias because the Mendelian distribution of alleles is not dependent on racial/ethnic group.13,14 

Germ line DNA was genotyped on the Omni-5 Quad BeadChip (Illumina, San Diego, CA). After quality control measures to filter poor-performing single nucleotide polymorphisms (SNPs) and samples, 118 LCH families (n = 87 case-parent trios; n = 31 case-parent duos) were included. Among the families, high-quality genotype data for 1 672 105 germ line SNPs were tested for an association with LCH using PREMIM-EMIM software that incorporates a multinomial log-likelihood approach for assessing case-parent trio and duo GWAS data.12,13,15,16  Case-parent trios and case-mother or case-father duos can be simultaneously analyzed in EMIM to estimate inherited genetic effects.15,16  The EMIM parameters were as follows: R2 = R12 and all other parameters set to 0; a log-additive model of inheritance was used to evaluate each SNP, coded as 0, 1, or 2 copies based on the number of minor alleles. Effect estimates, 95% confidence intervals (CIs), and corresponding P values were estimated from EMIM output using R (http://www.R-project.org/). The National Institutes of Health (NIH), National Human Genome Research Institute’s (NHGRI’s) genome-wide level of significance (P ≤ 5.0 × 10−8) and suggestive significance (P = 1.0 × 10−5) were used to inform genome-wide significance. In the discovery stage, we set the statistical level of significance at P ≤ 1.0 × 10−6 as the threshold for replication locus selection.

In the discovery GWAS, 9 inherited genomic regions were identified with SNPs associated with LCH susceptibility at NHGRI’s level of suggestive genome-wide significance (P ≤ 1.0 × 10−5). One SNP surpassed the more stringent level of statistical significance set by study investigators (P ≤ 1.0 × 10−6; Figure 1) and was selected for replication. The variant allele of SMAD6 rs12438941 (G>A) was significantly associated with an increased risk of developing LCH (odds ratio [OR]discovery = 12.57; 95% CI: 3.00-52.65; P = 7.99 × 10−7; Table 1).

Figure 1.

Regional association plot highlighting the genomic region of the locus selected from discovery GWAS for replication. Recombination rate and linkage disequilibrium with SMAD6 rs12438941 (red diamond) shown for chromosome 15 using SNP Annotation and Proxy Search. Case-parent trio GWAS results were used to derive the P values used to create the figure. Each square represents a specific SNP in the genomic region, and the color of the filled squares depicts the r2 between that SNP and the most strongly associated SNP in our study population; the more highly correlated a SNP is with our locus of interest, the redder the box coloration. The blue lines indicate the recombination rate of this genomic region in the 1000 Genomes CEU (Northern Europeans from Utah) population.

Figure 1.

Regional association plot highlighting the genomic region of the locus selected from discovery GWAS for replication. Recombination rate and linkage disequilibrium with SMAD6 rs12438941 (red diamond) shown for chromosome 15 using SNP Annotation and Proxy Search. Case-parent trio GWAS results were used to derive the P values used to create the figure. Each square represents a specific SNP in the genomic region, and the color of the filled squares depicts the r2 between that SNP and the most strongly associated SNP in our study population; the more highly correlated a SNP is with our locus of interest, the redder the box coloration. The blue lines indicate the recombination rate of this genomic region in the 1000 Genomes CEU (Northern Europeans from Utah) population.

Close modal
Table 1.

Summary of association results in discovery GWAS and replication cohorts

dbSNP locusAlleles*StageOR (95% CI)P
SMAD6 rs12438941 (A) vs G Discovery GWAS 12.57 (3.00-52.65) 7.99 × 10−7 
  Replication cohort 3.39 (2.29-5.03) 1.30 × 10−9 
  Pooled estimate 3.72 (2.54-5.44) 1.29 × 10−11 
dbSNP locusAlleles*StageOR (95% CI)P
SMAD6 rs12438941 (A) vs G Discovery GWAS 12.57 (3.00-52.65) 7.99 × 10−7 
  Replication cohort 3.39 (2.29-5.03) 1.30 × 10−9 
  Pooled estimate 3.72 (2.54-5.44) 1.29 × 10−11 
*

Risk allele in parentheses.

Estimated under an additive genetic model.

Tested under a fixed-effect model.

A case-control study design was used as an orthogonal method to replicate this discovery GWAS result. LCH cases (n = 132) were recruited from an institutional cohort with the same processes as the discovery series. In both the discovery GWAS and replications cohorts, the majority of cases had low-risk organ involvement (77.1% and 86.4%, respectively; supplemental Table 1), multisystem disease (43.2% and 36.8%, respectively), multiple disease sites (64.4% and 67.4%, respectively), and had not experienced a LCH relapse event (54.2% and 56.1%, respectively). When considering the distribution of this SNP by patient characteristics, only the degree of disease dissemination was suggested to differ by SMAD6 genotype (P = .04; supplemental Table 2).

The replication control cohort was drawn from 2 authorized-access data sets obtained from the NIH, National Center for Biotechnology Information Database of Genotypes and Phenotypes (dbGaP).17  Specifically, subjects from the Genotype-Tissue Expression and Genetic Architecture of Smoking and Smoking Cessation studies were used as non-LCH controls because the prevalence of adult survivors of LCH is <4 in 1 000 000.18  Quality-controlled genotype data for this locus was abstracted from dbGaP for 1645 controls. Germ line DNA from LCH cases included for replication was genotyped using TaqMan primers specific to SMAD6 rs12438941 (Life Technologies, Carlsbad, CA). Logistic regression was used to generate an OR, 95% CI, and P value for this association.

In the replication cohort, this association remained statistically significant (ORreplication = 3.39; 95% CI: 2.29-5.03; P = 1.30 × 10−9; Table 1). A fixed-effects meta-analysis was performed to combine results across the discovery and replication stages, yielding a genome-wide level of significance (summary OR = 3.72; 95% CI: 2.54-5.44; P = 1.29 × 10−11). The summary OR identified in this study is consistent with those identified in other GWAS of pediatric cancers,19  and the larger effect sizes predicted for rarer variants in pediatric conditions, which reduces sample size requirements.20 

We identified a novel risk variant within SMAD6 that significantly increases the risk of developing LCH and was replicated in a separate cohort of LCH patients and reached genome-wide level of significance in joint analysis. In 1000 Genomes phase 3 data, the SMAD6 rs12438941 risk allele is enriched in Mexican ancestral (minor allele frequency [MAF] = 0.25) and Peruvian (MAF = 0.36) populations compared with Chinese (MAF = 0.00), Japanese (MAF = 0.00), and Nigerian populations (MAF = 0.00). This difference in risk allele frequency suggests that this locus may contribute to the difference in LCH incidence observed between Hispanic and non-Hispanic populations.7 SMAD6 rs12438941 is located in the intronic region between exon 3 and exon 4. Notably, intronic SNPs can influence splicing, resulting in altered structure and function of the protein product,21  and ∼93% of GWAS hits identified are in noncoding regions.22  Although this SNP has not been reported in previous studies of LCH, several SMAD6 polymorphisms are associated with aberrant biologic events, including an increased risk of brain metastasis in patients with non–small-cell lung cancer.23  It is important to note that it is unclear if the SNP identified in this study is the causal allele impacting LCH risk, or if SMAD6 rs12438941 is simply a proxy for another SNP that is truly the causal locus. Deep resequencing of SMAD6 is warranted to further elucidate this association.

The SMAD6 protein is an inhibitory SMAD for bone morphogenetic protein (BMP) and/or transforming growth factor-β (TGF-β)/activin signaling.24  Both BMP7-ALK3 (canonical BMP signaling) and TGF β1-ALK5 (canonical TGF-β1 signaling) serve as determining factors for epidermal Langerhans cell differentiation from human CD34+ hematopoietic progenitor cells.25  Although, future studies are needed to determine if variations in BMP signaling impact lineage differentiation and/or survival of myeloid DC progenitor cells in LCH. Further, although the risk allele identified in this study is within SMAD6, there are other mechanisms through which this locus may impact LCH pathogenesis.26,27  This risk allele might act through an effect on other genes in the same region (including MAP2K1; Figure 1) or through genes at distal sites.

In conclusion, this study evaluated the role of inherited genetic variation in LCH susceptibility and identified a novel risk variant in SMAD6. The current paradigm of LCH pathogenesis is based on activating somatic mutations in MAPK pathway genes, which fails to account for differential risks of disease across racial/ethnic groups. These results provide evidence that inherited genetic variation impacts the risk of acquiring LCH.

The online version of this article contains a data supplement.

Acknowledgments: This work was supported in part by funding from the Spit-for-a-Cure Project with the HistioCure Foundation to the TXCH Histiocytosis Program. This work was also supported by National Institutes of Health (NIH), National Cancer Institute (NCI) grants R01-CA154489 (C.E.A. and K.L.M.), R01-CA154947 (M.M. and C.E.A.), and R25-CA160078 (Training Program in Pediatric Cancer Epidemiology and Control; E.C.P.-G. and M.E.S.), NIH Specialized Programs of Research Excellence in Lymphoma grant P50-CA126752 (C.E.A.), the St. Baldrick’s Foundation (North American Consortium for Histiocytosis Research, C.E.A., K.L.M., and M.M.), an Alex’s Lemonade Stand Foundation Epidemiology Grant (P.J.L.), an Alex’s Lemonade Stand Foundation Young Investigator Grant (R.C.), a Thrasher Research Fund Early Career Award (E.C.P.-G.), an American Society of Hematology Scholar Award (E.C.P.-G.), an American Society of Hematology Scholar Award (R.C.), Cookies for Kids Cancer (C.E.A.), and Liam’s Lighthouse Foundation (C.E.A. and K.L.M.). This work was also supported by shared resources from Dan L. Duncan Cancer Center support NCI grant P30-CA125123. The authors also acknowledge the contribution of data from Genetic Architecture of Smoking and Smoking Cessation accessed through dbGAP. Funding support for genotyping, which was performed at the Center for Inherited Disease Research (CIDR), was provided by NIH, National Human Genome Research Institute (NHGRI) grant 1 X01 HG005274-01. CIDR is fully funded through contract HHSN268200782096C from the NIH to The Johns Hopkins University. Assistance with genotype cleaning, as well as with general study coordination, was provided by the Gene Environment Association Studies Coordinating Center (NHGRI grant U01 HG004446). Funding support for the collection of data sets and samples was provided by the Collaborative Genetic Study of Nicotine Dependence (NCI grant P01 CA089392) and the University of Wisconsin Transdisciplinary Tobacco Use Research Center (NIH, National Institute on Drug Abuse [NIDA] grant P50 DA019706 and NCI grant P50 CA084724). The Genotype-Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the NIH (commonfund.nih.gov/GTEx). Additional funds were provided by the NIH, NCI, NHGRI, National Heart, Lung, and Blood Institute, NIDA, National Institute of Mental Health [NIMH], and National Institute of Neurological Disorders. Donors were enrolled at Biospecimen Source Sites funded by NCI\Leidos Biomedical Research, Inc. subcontracts to the National Disease Research Interchange (10XS170), the Roswell Park Cancer Institute (10XS171), and Science Care, Inc. (X10S172). The Laboratory, Data Analysis, and Coordinating Center was funded through Department of Health and Human Services contract HHSN268201000029C to the Broad Institute. Biorepository operations were funded through a Leidos Biomedical Research, Inc. subcontract to Van Andel Research Institute (10ST1035). Additional data repository and project management were provided by Leidos Biomedical Research, Inc. (Department of Health and Human Services contract HHSN261200800001E). The Brain Bank was supported supplements to University of Miami grant DA006227. NIMH statistical methods development grants were made to the University of Geneva (MH090941 and MH101814), the University of Chicago (MH090951, MH090937, MH101825, and MH101820), the University of North Carolina, Chapel Hill (MH090936), North Carolina State University (MH101819), Harvard University (MH090948), Stanford University (MH101782), Washington University (MH101810), and the University of Pennsylvania (MH101822). The data sets used for the analyses described in this manuscript were obtained from dbGaP at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession numbers phs000424.v5.p1 (GTEx) and phs000404.v1.p1 (Genetic Architecture of Smoking and Smoking Cessation).

Contribution: E.C.P.-G. and R.C. designed and performed the research, collected the data, analyzed and interpreted the data, and wrote the first version of the manuscript; M.E.S. analyzed and interpreted the data and wrote the manuscript; J.W.B. provided feedback on genotyping methods and data quality issues; H.A., A.G.S., B.P.S., O.E., D.J.Z., A.S., and L.M. performed research and collected data; P.J.L., M.M., K.L.M., D.W.P., and C.E.A. designed the research, analyzed and interpreted the data, and wrote the manuscript; E.C.P.-G., R.C., P.J.L., and C.E.A. contributed equally to this study; and all authors read and approved the final manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Philip J. Lupo, Section of Pediatric Hematology-Oncology, Department of Pediatrics, Baylor College of Medicine, Room ABBR-R516, Mail Stop BCM305, Houston, TX 77030; e-mail: philip.lupo@bcm.edu; and Carl E. Allen, Texas Children's Cancer and Hematology Centers, 1102 Bates St, Suite 1025.22, Houston, TX 77030; e-mail: ceallen@txch.org.

1.
Alston
RD
,
Tatevossian
RG
,
McNally
RJ
,
Kelsey
A
,
Birch
JM
,
Eden
TO
.
Incidence and survival of childhood Langerhans cell histiocytosis in Northwest England from 1954 to 1998
.
Pediatr Blood Cancer
.
2007
;
48
(
5
):
555
-
560
.
2.
Baumgartner
I
,
von Hochstetter
A
,
Baumert
B
,
Luetolf
U
,
Follath
F
.
Langerhans’-cell histiocytosis in adults
.
Med Pediatr Oncol
.
1997
;
28
(
1
):
9
-
14
.
3.
Rollins
BJ
.
Genomic Alterations in Langerhans Cell Histiocytosis
.
Hematol Oncol Clin North Am
.
2015
;
29
(
5
):
839
-
851
.
4.
Chakraborty
R
,
Burke
TM
,
Hampton
OA
, et al
.
Alternative genetic mechanisms of BRAF activation in Langerhans cell histiocytosis
.
Blood
.
2016
;
128
(
21
):
2533
-
2537
.
5.
Berres
ML
,
Lim
KP
,
Peters
T
, et al
.
BRAF-V600E expression in precursor versus differentiated dendritic cells defines clinically distinct LCH risk groups [published correction appears in J Exp Med 2015; 212(2):281]
.
J Exp Med
.
2014
;
211
(
4
):
669
-
683
.
6.
Collin
M
,
Bigley
V
,
McClain
KL
,
Allen
CE
.
Cell(s) of origin of Langerhans cell histiocytosis
.
Hematol Oncol Clin North Am
.
2015
;
29
(
5
):
825
-
838
.
7.
Ribeiro
KB
,
Degar
B
,
Antoneli
CB
,
Rollins
B
,
Rodriguez-Galindo
C
.
Ethnicity, race, and socioeconomic status influence incidence of Langerhans cell histiocytosis
.
Pediatr Blood Cancer
.
2015
;
62
(
6
):
982
-
987
.
8.
Oksuzyan
S
,
Crespi
CM
,
Cockburn
M
,
Mezei
G
,
Vergara
X
,
Kheifets
L
.
Race/ethnicity and the risk of childhood leukaemia: a case-control study in California
.
J Epidemiol Community Health
.
2015
;
69
(
8
):
795
-
802
.
9.
Xu
H
,
Cheng
C
,
Devidas
M
, et al
.
ARID5B genetic polymorphisms contribute to racial disparities in the incidence and treatment outcome of childhood acute lymphoblastic leukemia
.
J Clin Oncol
.
2012
;
30
(
7
):
751
-
757
.
10.
Swinney
RM
,
Beuten
J
,
Collier
AB
III
, et al
.
Polymorphisms in CYP1A1 and ethnic-specific susceptibility to acute lymphoblastic leukemia in children
.
Cancer Epidemiol Biomarkers Prev
.
2011
;
20
(
7
):
1537
-
1542
.
11.
Yoon
HH
,
Shi
Q
,
Alberts
SR
, et al
;
Alliance for Clinical Trials in Oncology
.
Racial differences in BRAF/KRAS mutation rates and survival in stage III colon cancer patients
.
J Natl Cancer Inst
.
2015
;
107
(
10
):
djv186
.
12.
Weinberg
CR
,
Wilcox
AJ
,
Lie
RT
.
A log-linear approach to case-parent-triad data: assessing effects of disease genes that act either directly or through maternal effects and that may be subject to parental imprinting
.
Am J Hum Genet
.
1998
;
62
(
4
):
969
-
978
.
13.
Wilcox
AJ
,
Weinberg
CR
,
Lie
RT
.
Distinguishing the effects of maternal and offspring genes through studies of “case-parent triads.”
Am J Epidemiol
.
1998
;
148
(
9
):
893
-
901
.
14.
Yu
K
,
Wang
Z
,
Li
Q
, et al
.
Population substructure and control selection in genome-wide association studies
.
PLoS One
.
2008
;
3
(
7
):
e2551
.
15.
Howey
R
,
Cordell
HJ
.
PREMIM and EMIM: tools for estimation of maternal, imprinting and interaction effects using multinomial modelling
.
BMC Bioinformatics
.
2012
;
13
:
149
.
16.
Kistner
EO
,
Weinberg
CR
.
Method for using complete and incomplete trios to identify genes related to a quantitative trait
.
Genet Epidemiol
.
2004
;
27
(
1
):
33
-
42
.
17.
Mailman
MD
,
Feolo
M
,
Jin
Y
, et al
.
The NCBI dbGaP database of genotypes and phenotypes
.
Nat Genet
.
2007
;
39
(
10
):
1181
-
1186
.
18.
Willis
B
,
Ablin
A
,
Weinberg
V
,
Zoger
S
,
Wara
WM
,
Matthay
KK
.
Disease course and late sequelae of Langerhans’ cell histiocytosis: 25-year experience at the University of California, San Francisco
.
J Clin Oncol
.
1996
;
14
(
7
):
2073
-
2082
.
19.
Perez-Andreu
V
,
Roberts
KG
,
Harvey
RC
, et al
.
Inherited GATA3 variants are associated with Ph-like childhood acute lymphoblastic leukemia and risk of relapse
.
Nat Genet
.
2013
;
45
(
12
):
1494
-
1498
.
20.
Agopian
AJ
,
Eastcott
LM
,
Mitchell
LE
.
Age of onset and effect size in genome-wide association studies
.
Birth Defects Res A Clin Mol Teratol
.
2012
;
94
(
11
):
908
-
911
.
21.
Wang
GS
,
Cooper
TA
.
Splicing in disease: disruption of the splicing code and the decoding machinery
.
Nat Rev Genet
.
2007
;
8
(
10
):
749
-
761
.
22.
Maurano
MT
,
Humbert
R
,
Rynes
E
, et al
.
Systematic localization of common disease-associated variation in regulatory DNA
.
Science
.
2012
;
337
(
6099
):
1190
-
1195
.
23.
Li
Q
,
Wu
H
,
Chen
B
, et al
.
SNPs in the TGF-β signaling pathway are associated with increased risk of brain metastasis in patients with non-small-cell lung cancer
.
PLoS One
.
2012
;
7
(
12
):
e51713
.
24.
Ruschke
K
,
Meier
C
,
Ullah
M
, et al
.
Bone morphogenetic protein 2/SMAD signalling in human ligamentocytes of degenerated and aged anterior cruciate ligaments
.
Osteoarthritis Cartilage
.
2016
;
24
(
10
):
1816
-
1825
.
25.
Yasmin
N
,
Bauer
T
,
Modak
M
, et al
.
Identification of bone morphogenetic protein 7 (BMP7) as an instructive factor for human epidermal Langerhans cell differentiation
.
J Exp Med
.
2013
;
210
(
12
):
2597
-
2610
.
26.
Claussnitzer
M
,
Dankel
SN
,
Kim
K-H
, et al
.
FTO Obesity Variant Circuitry and Adipocyte Browning in Humans
.
N Engl J Med
.
2015
;
373
(
10
):
895
-
907
.
27.
Timberlake
AT
,
Choi
J
,
Zaidi
S
, et al
.
Two locus inheritance of non-syndromic midline craniosynostosis via rare SMAD6 and common BMP2 alleles
.
eLife
.
2016
;
5
:
e20125
.

Author notes

*

E.C.P.-G., R.C., P.J.L., and C.E.A. contributed equally to this work.

Sign in via your Institution