Gene expression profiles enable global analysis that can interrogate the activity patterns of various cellular pathways across biological conditions. Indeed this approach has generated data across numerous patient populations over the past decades allowing molecular stratification of disease, including hematological and lymphoid malignancies. An emerging theme from cancer genomics studies is the remarkable similarity between specific cancers of different lineages. For example, particular subtypes of bladder cancer resemble breast and lung tumors despite their different tissue of origin. Whether the same may hold true across hematological and lymphoid cancers is currently unknown. In normal development, cells commit to their lineage by activation of densely interconnected transcription factors (TFs), or TF modules, in a series of decision points at which a choice is made between alternative lineage fates. Their mutual exclusivity can be used for discovery of key genes of both normal and malignant hematopoiesis. Importantly, TF translocations represent frequent genetic events in hematological cancer.

We harnessed computational methods to organize and characterize samples from 9544 distinct hematological and lymphoid cancer patients, healthy donors and pre-malignant stages generating a pan-cancer resource for interrogating their molecular states. A central part of the resource is a curated transcriptome dataset that we provide across 37 different disease subtypes as an interactive online resource (http://compbio.uta.fi/hemap/). The dimensionality reduction method known as t-Distributed Stochastic Neighbor Embedding (t-SNE) achieved optimal placement of highly similar samples at close proximity in two dimensions, enabling a biologically meaningful visualization of the data set as well as comparative analysis based on gene signatures, drug target expression and regulatory network state. For patient stratification, unsupervised clustering in t-SNE space yielded comparable performance to robust and reproducible classifiers. We further demonstrate with multilevel data from The Cancer Genome Atlas that new samples can be included in context of the existing patient profiles. Data integration highlights the molecular architecture that relates to the clinical and genetic features of the samples studied, revealing new insight on molecular phenotypes that distinguish AML samples that lack a subtype based on current clinical stratification. Finally, we used the resource to provide a roadmap for candidate drug therapies and quantify the regulatory network alterations across hematological malignancies.

The divergence of cancer regulatory networks from the reference healthy cell states and mutually exclusive patterns of TF expression that are specific to the different malignancies pave the way towards therapies targeting the cancer epigenome and characterization of downstream targets of TF-fusions or aberrant enhancer usage, as exemplified with independent validation data at the IRX1 and ERG loci.

Disclosures

No relevant conflicts of interest to declare.

Author notes

*

Asterisk with author names denotes non-ASH members.

Sign in via your Institution