Abstract
Introduction: Diffuse large B-cell lymphoma (DLBCL) is the most common aggressive form of non-Hodgkin lymphoma, with 30-40% of cases relapsing following frontline treatment. Molecular subtyping of DLBCL has been shown to aid in risk stratification, prognostication, and treatment selection. Although various methods for molecular subtyping exist, tissue requirements limit analysis through multiple approaches. Previous studies have suggested that molecular profiling can improve prognostic prediction in DLBCL, compared to circulating tumor ctDNA (ctDNA) dynamics and clearance alone. Here, we utilized whole-exome sequencing (WES) data generated as part of the workflow for tumor-informed circulating tumor (ct)DNA testing in patients with DLBCL to identify molecular clusters agnostic of other methodologies and correlate these subtypes with ctDNA dynamics.
Methods: This study included a cohort of 200 patients with DLBCL who underwent ctDNA testing using a clinically validated, mPCR-NGS ctDNA assay (SignateraTM). First, we used unsupervised consensus Partitioning Around Medoids (PAM) clustering with the Jaccard distance metric to identify mutation-defined subgroups. Using various mutation frequency cutoffs (2-18% of patients with a mutation in the genes of interest), the proportion of ambiguous clustering (PAC) score was used to select the optimal number of genes and clusters. To evaluate the association between gene mutations and cluster assignment, we applied multinomial logistic regression with LASSO regularization using the glmnet R package. Univariate analysis evaluated the association of the clusters with other patient and tumor features. Once selected, we characterized the clusters and assessed whether they were associated with ctDNA by comparing patients with persistent positive or negative ctDNA results (patients with mixed results were excluded) using the Chi-square test. Standardized residuals were used to identify specific subgroups associated with ctDNA status.
Results: Among the 200 patients, median age was 62.5 years (range 17-92 years), 54% were male (N=108), and 39% (N=77) were stage IV (I: 11%, N=22; II: 16%, N=31; III: 18%, N=36; unknown: 17%, N=34). Evaluation of PAC scores led to the selection of a mutation frequency cutoff of 18%, resulting in a small panel of genes (frequency range: 18.5% to 25.5%) and a set of clusters (i.e., subtypes). This selection demonstrated clear, strongly defined clusters with minimal inter-cluster blending. When comparing the mutation status (wildtype, mutated) of the genes across the clusters, a distinct mutational signature was observed as seen by cluster-specific non-zero coefficients from the multinomial logistic regression model. Further, we found that the clusters were not associated with other clinician-reported patient features, including sex (p=0.44), age (p=0.24), and stage (p=0.96). While there was no overall correlation with existing subtyping models, certain individual clusters demonstrated overlap with known cell-of-origin classifications. In evaluating the overall association of ctDNA dynamics with the subtypes, a significant association between molecular cluster and longitudinal ctDNA testing during all lines of treatment and surveillance was observed (p = 0.048). Certain subtypes were associated with persistent ctDNA-positive results (standardized residual: 2.21, 61.5% all positive), while other subtypes were associated with persistent ctDNA-negative results (standardized residual: 2.25, 100% all negative).
Conclusions: Here, we successfully developed a novel methodology to evaluate and infer molecular subtypes based on WES data alone, demonstrating that a single sample can be used to develop a personalized ctDNA assay and complete molecular subtyping. Further, the subtypes were orthogonal to existing classifiers and remained consistent regardless of patient age, stage, or sex. Interestingly, certain subtypes strongly correlated with ctDNA status, highlighting a potential link between genotypes, molecular response, and patient outcomes. These findings support the feasibility of a unified genomic approach for subtype classification and personalized monitoring, warranting validation in larger independent cohorts with clinical outcomes.