CLL exhibits great variability in genetic alterations and clinical outcome across patients. Gene expression can serve as an intermediate measurement to provide insight into the cellular circuitry linking genotype to phenotype. To explore this approach, we developed methods to statistically assess the associations between somatic mutations, transcriptional programs, and clinical outcome.

First, in a cohort of 229 CLL patients, we associated IGHV-mutation status as well as 7 recurrent (>5% of CLLs) somatic alterations with 44 modules of co-expressed genes that were derived by clustering 3719 variably expressed genes measured by Affymetrix arrays. The somatic mutation genotypes were identified by whole exome sequencing (N=130) and/or SNP6 arrays (N=229). By permutation testing, we determined that 21 modules were associated with IGHV mutation status (P<0.001). After controlling for IGHV mutation status, we identified mutated genotypes associated with independent expression modules (trisomy 12 [n=10], ATM/del11q [n=12], SF3B1 [n=6], TP53/del17p [n=5], del13q [n=2], MYD88 alone [n=1], NOTCH1 [n=0] and some modules associated with more than one genotype (P<0.001)). 63% of these candidate genotype-module associations were validated using an independent dataset of matched WES and gene expression data from 82 patients (Puente et al, Nat Gen, 2011) (P<0.1; I.e. of the validated 63%, fewer than 3 are expected to be false positives).

Second, having identified transcriptional modules associated with distinct genotypes, we sought to understand the functions of these modules and to infer the regulators of these programs. To associate modules with potential functional phenotypes, we performed gene enrichment analysis, and found multiple modules associated with inflammatory signatures, DNA repair and MYC targets. We then populated the intermediary layer between genotype and module with candidate transcription factors (TFs) by integrating curated TF datasets with TF expression, motif analysis and module expression. For example, CREB and ATF were candidate TF regulators in modules associated with SF3B1 and ATM mutations, respectively; while MYC- and NFkB-related TFs were candidate regulators of modules associated with both trisomy 12 or MYD88 mutations. We also identified several candidates of cellular convergence, where multiple genotypes lead to activation of the same transcriptional program. For example, the transcription factor EBF1, an important B cell regulator, was nominated as a candidate regulator in 8 of 44 modules which were associated with differing genotypes, suggesting the importance of EBF1 in mediating the genotype-phenotype relationship.

Third, to complete the map from genotype to phenotype, we linked module expression with a clinical outcome. Using elastic-net Cox regression, we identified 2 modules associated with longer and 6 with shorter time from sample acquisition to treatment or death. Many of these modules were associated with well-established prognostic indicators (6 with IGHV status, 1 with P53/del17p, and 1 with SF3B1 status P<.001). One module lacking statistical association with any genotype was enriched with a stem cell (P=5-11) and with immune system activation (P=2-6) gene-sets. We used the expression-based Cox-regression index to classify patients into high and low risk subgroups (logrank P=6.1-9). Ongoing work seeks to assess the predictive power of our gene-expression signature in relation to traditional prognostic measures, as well as to further annotate the outcome-associated modules.

In summary, this analysis serves as a proof of principle for a ‘genotype-phenotype map’ for CLL linking somatic alterations, gene expression programs, and clinical outcome. Our inferred CLL networks generate testable hypotheses that explain how genotypes affect the cellular circuitry of CLL cells that are currently being tested through functional gain/loss of function experiments.

Disclosures:

Brown:Pharmacyclics, Genentech, Celgene, Emergent, Onyx, Sanofi Aventis, Vertex, Avila, Novartis: Consultancy; Genzyme, Celgene: Research Funding.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution