Figure 3.
Patient–centered epigenetic analysis and machine learning predicts the most potent transcriptional regulators of CD38. (A) A total of 46 transcription factors predicted to bind to the CD38 locus were derived from motif analysis of published ATAC-seq data (see supplemental Figure 3). Gene expression of each transcription factor (TF) was correlated with CD38 expression in the Multiple Myeloma Research Foundation (MMRF) CoMMpass database (release IA13), with RNA-seq data from CD138+ enriched tumor cells at diagnosis (n = 664 patients). Top predicted positive and negative regulators are shown based on Pearson correlation (R). (B) CoMMpass RNA-seq data illustrate strong positive correlation between XBP1 and CD38 expression. (C) XGBoost machine learning model was used to extract features of TF gene expression that best-model CD38 expression in CoMMpass tumors (shown in log2 TPM [transcripts per million]); 80% of data were used as test set, with 20% left out as a training set. Coefficient of variation (R2) for predictive model = 0.49 after five-fold cross-validation. (D) Shapley additive explanations (SHAP) analysis indicates transcription factors whose expression most strongly affects CD38 expression levels in CoMMpass tumors. FPKM, fragments per kilobase million; TPM, transcripts per million.

Patient–centered epigenetic analysis and machine learning predicts the most potent transcriptional regulators of CD38. (A) A total of 46 transcription factors predicted to bind to the CD38 locus were derived from motif analysis of published ATAC-seq data (see supplemental Figure 3). Gene expression of each transcription factor (TF) was correlated with CD38 expression in the Multiple Myeloma Research Foundation (MMRF) CoMMpass database (release IA13), with RNA-seq data from CD138+ enriched tumor cells at diagnosis (n = 664 patients). Top predicted positive and negative regulators are shown based on Pearson correlation (R). (B) CoMMpass RNA-seq data illustrate strong positive correlation between XBP1 and CD38 expression. (C) XGBoost machine learning model was used to extract features of TF gene expression that best-model CD38 expression in CoMMpass tumors (shown in log2 TPM [transcripts per million]); 80% of data were used as test set, with 20% left out as a training set. Coefficient of variation (R2) for predictive model = 0.49 after five-fold cross-validation. (D) Shapley additive explanations (SHAP) analysis indicates transcription factors whose expression most strongly affects CD38 expression levels in CoMMpass tumors. FPKM, fragments per kilobase million; TPM, transcripts per million.

Close Modal

or Create an Account

Close Modal
Close Modal