Key Points
Computational evaluation of axi-cel across University of California Health confirms its RW benefit and toxicity burden in diverse patients.
An ML model predicts early relapse using 7 factors collected 24 hours after infusion, aiding early intervention to improve outcomes.
Visual Abstract
Accumulating real-world (RW) evidence of axicabtagene ciloleucel (axi-cel) has demonstrated comparable performance to that of pivotal trials. However, ∼57% of patients eventually relapse, with most requiring additional therapies. Being able to identify patients with risk of early relapse enables clinicians to consider additional interventions to extend survival outcomes. This study aimed to first comprehensively evaluate the RW performance of axi-cel in the multicenter University of California Health Systems using automated computational approaches. Second, we developed a decision tree machine learning (ML) model to identify patients with risks of early relapse within 6 months. A total of 416 adult patients with diffuse large B-cell lymphoma (DLBCL) receiving axi-cel between 2017 and 2024 were included in the study. The median progression-free survival (PFS) and overall survival (OS) were 10.1 and 54.4 months; the 18-month PFS and OS rates were 41.6% and 65.5%, respectively. Severe CRS and ICANS were observed in 18.8% and 32.5% of patients. The ML model, relying on age and 6 routinely measured laboratory tests (lactate dehydrogenase, C-reactive protein, ferritin, hematocrit, platelet count, and prothrombin time), achieved a high area under the receiver operating characteristic curve score of 0.82. The decision curve analysis indicated positive net benefit of the model across a broad range (0-0.7) of decision thresholds, suggesting clinical utility in diverse scenarios. This study further confirmed the RW performance of axi-cel in diverse populations. Our ML model represented a novel approach to identify patients that may benefit from additional interventions to extend survival outcomes. Following prospective confirmation study, our model and decision tree approach may support clinical decision making in patients with high-risk DLBCL.
Introduction
Diffuse large B-cell lymphoma (DLBCL) is the most prevalent subtype of non-Hodgkin lymphoma accounting for 4% of all cancers in the United States.1 Between 30% and 40% of patients become refractory or relapse after chemoimmunotherapy and receive salvage chemotherapies followed by autologous stem cell transplant.2 Axicabtagene ciloleucel (axi-cel), a chimeric antigen receptor T-cell (CAR-T) therapy, demonstrated superior efficacy to historical outcomes in the pivotal ZUMA-1 trial in patients with relapsed/refractory DLBCL.3-5 It was approved for second-line treatment for patients with early relapse or refractory disease.3,6-8 Despite high response rates, ∼60% of patients eventually relapse, mostly within the first year.4,9 These failures lead to poor survival outcomes (eg, median progression-free survival [PFS] and overall survival [OS] of 3 and 5 months, respectively).10-12
Axi-cel is also associated with serious toxicities, cytokine release syndrome (CRS) and immune effector cell–associated neurotoxicity syndrome (ICANS),13 and rare but reported cases of secondary T-cell lymphoma, with estimated rates of rates 0.068% to 0.22%.14-17 There is a critical need for clinically interpretable strategies to identify patients at risk for early relapse and toxicity to improve outcomes.
Accumulating real-world (RW) evidence suggests similar survival outcomes to those in ZUMA-1.18 However, most RW studies focused primarily on survival and CRS and ICANS rates,19-30 with limited characterization of other toxicities. Although previous studies have explored prognostic factors associated with severe toxicities31,32 or survival outcomes, none have used emerging machine learning (ML) techniques to develop predictors of outcomes to guide treatment.33,34 Here, we applied a novel ML approach to large-scale clinical data, demonstrating its utility in predicting early relapses and informing patient care.
This retrospective study is, to our knowledge, the first multicenter evaluation of axi-cel across the University of California Health Systems (UC Health), aiming to assess RW outcomes and develop a novel ML model to identify patients at risk for early relapse within 6 months. Our findings provide important insights into improving clinical care to further extend survival benefits while minimizing potential toxicities.
Methods
The following sections pertaining to study design and analyses are described succinctly herein to aid interpretation; full details are in the supplemental Appendix. This study followed the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis reporting guideline for prediction algorithm validation,35 and was approved by the institutional review board at UC San Francisco (protocol 20-29908).
Data source
Deidentified patient-level data were extracted from the University of California Health Data Warehouse (UCHDW) on 6 December 2024. The UCHDW36 contains harmonized electronic health records (EHRs) across 6 UC academic health centers, with linkage to the California Electronic Death Registry System to identify deaths occurred outside of UC Health. Some UC Los Angeles patients may have been included in a previous RW study.19
Study population
This retrospective study included all 416 adult patients with DLBCL who received axi-cel at UC Health between October 2017 and November 2024.
End points and variables
Primary end points included survival and safety outcomes, and the area under the receiver operating characteristic curve (AUROC) score of a decision tree machine learning (ML) model predicting early relapse within 6 months after infusion. Survival outcomes included the median and rates of PFS and OS at 18 months, rates of severe (grade ≥3) CRS and ICANS. Secondary end points included the net benefit of the ML model, rates and durations of other severe toxicities (eg, hematological, organ function, electrolyte/metabolic toxicities, etc) with onset occurring within 30 days after infusion, hazard ratios (HRs) of factors associated with PFS, and biomarkers associated with severe CRS and ICANS. PFS was time from axi-cel infusion to progression, recurrence, or death; OS was time to death. Secondary T-cell malignancy rates were also reported.
Patient-level variables, including diagnosis, medications, procedures, and laboratory test results, were used to characterize patients’ baseline characteristics, procedures to receive axi-cel (Table 1), and eligibility for ZUMA-1 by mirroring its inclusion and exclusion criteria (supplemental Tables 1 and 2). Axi-cel infusion was considered delayed when there were >3 days between the last day of lymphodepleting therapy and the axi-cel infusion. For toxicity characterizations, we focused on toxicities reported in ZUMA-1 that can be objectively measured and graded for severity (eg, abnormal laboratory tests) using the common terminology criteria for adverse events.37 To ascertain the presence and severity of CRS and ICANS, we developed and validated rule-based algorithms designed according to the grading system described in the American Society for Transplantation and Cellular Therapy (ASTCT) and the National Comprehensive Cancer Network (NCCN) guideline for management of immunotherapy-related toxicities, respectively (supplemental Figures 1-3).3,38,39 For other toxicities, such as hematological, organ function–related, or electrolyte and metabolic toxicities, we identified events with onset occurring within 30 days after infusion. Their duration was estimated based on the time to resolution, defined as when the relevant laboratory values returned to the normal reference range. Prolonged toxicities were defined as those with durations extending beyond 30 days from the onset of the event. Additional laboratory values obtained immediately before and after axi-cel, comorbidities, Charlson Comorbidity Index, and infections developed within 14 days before axi-cel were used in the subsequent predictive analyses for survival outcomes and the decision tree ML model (supplemental Table 3). Laboratory values immediately after infusion were also used to explore serum biomarkers associated with severe CRS and ICANS.
Patient baseline characteristics and eligibility according to ZUMA-1 criteria
| Variables . | n (%) . |
|---|---|
| UC study cohort | N = 416 |
| Baseline characteristics | |
| Primary disease type, n (%) | |
| DLBCL | 335 (80.5) |
| DLBCL/TFL/PMBCL | 81 (19.5) |
| Age | |
| Median (range), y | 63.0 (54-70) |
| ≥65, n (%) | 192 (46.1) |
| Male sex, n (%) | 271 (65.1) |
| BMI (kg/m2) | |
| Median (range) | 25.6 (23.1-28.9) |
| ≥25, n (%) | 245 (58.9) |
| Race, n (%) | |
| White | 232 (55.8) |
| Asian | 43 (10.3) |
| Black or African American, or Native Hawaiian or Other Pacific Islander, or American Indian or Alaska Native | 15 (3.6) |
| Native Hawaiian or Other Pacific Islander, or American Indian or Alaska Native | 5 (1.2) |
| Other or unknown | 126 (30.3) |
| Previous autologous stem cell transplant, n (%) | 31 (7.5) |
| Previous allogeneic stem cell transplant,∗ n (%) | 5 (1.2) |
| CAR-T preparation procedures, median (range) | |
| Leukapheresis to infusion days (v2v) | 28 (26-34) |
| Initiation of LD therapy (days before axi-cel) | 5 (5-5) |
| Received LD for >3 days, n (%) | 17 (4.1) |
| Deviation from the recommended axi-cel infusion timeline,† n (%) | 94 (22.6) |
| Last day of LD, median (range) | 3 (3-3) |
| Eligibility at leukapheresis | |
| Inadequate hematological, organ function, or having infection, n (%) | 66 (15.9) |
| Inadequate hematological profile, n (%) | 66 (15.9) |
| ANC of <1000 × 106/L | 13 (3.1) |
| Lymphocyte count of <100 × 106/L | 1 (0.2) |
| Platelet count at <75 000 × 106/L | 18 (4.3) |
| Organ function and infection, n (%) | |
| ALT > 2.5 ULN | 4 (1.0) |
| AST > 2.5 ULN | 2 (0.5) |
| Total bilirubin of >1.5 mg/dL | 4 (1.0) |
| EGFR < 60 mL/min | 23 (5.5) |
| EF < 50% | 1 (0.2) |
| O2 of <92% on room air | 0 (0.0) |
| Active infection | 0 (0.0) |
| Having history or presence of comorbidities,‡ n (%) | 139 (33.4) |
| CNS disorders | 31 (7.5) |
| CNS lymphoma or metastasis | 4 (1.0) |
| Myocardial complications | 81 (19.5) |
| Thrombosis | 51 (12.3) |
| Autoimmune diseases | 30 (7.2) |
| Bridging therapy | |
| Required bridging therapy, n (%) | 165 (39.7) |
| Rituximab-containing regimen | 104 (63.0) |
| RW patients who would not have been eligible and studied in ZUMA-1, n (%) (Patients did not meet eligibility criteria or required bridging therapies) | 239 (57.5) |
| Variables . | n (%) . |
|---|---|
| UC study cohort | N = 416 |
| Baseline characteristics | |
| Primary disease type, n (%) | |
| DLBCL | 335 (80.5) |
| DLBCL/TFL/PMBCL | 81 (19.5) |
| Age | |
| Median (range), y | 63.0 (54-70) |
| ≥65, n (%) | 192 (46.1) |
| Male sex, n (%) | 271 (65.1) |
| BMI (kg/m2) | |
| Median (range) | 25.6 (23.1-28.9) |
| ≥25, n (%) | 245 (58.9) |
| Race, n (%) | |
| White | 232 (55.8) |
| Asian | 43 (10.3) |
| Black or African American, or Native Hawaiian or Other Pacific Islander, or American Indian or Alaska Native | 15 (3.6) |
| Native Hawaiian or Other Pacific Islander, or American Indian or Alaska Native | 5 (1.2) |
| Other or unknown | 126 (30.3) |
| Previous autologous stem cell transplant, n (%) | 31 (7.5) |
| Previous allogeneic stem cell transplant,∗ n (%) | 5 (1.2) |
| CAR-T preparation procedures, median (range) | |
| Leukapheresis to infusion days (v2v) | 28 (26-34) |
| Initiation of LD therapy (days before axi-cel) | 5 (5-5) |
| Received LD for >3 days, n (%) | 17 (4.1) |
| Deviation from the recommended axi-cel infusion timeline,† n (%) | 94 (22.6) |
| Last day of LD, median (range) | 3 (3-3) |
| Eligibility at leukapheresis | |
| Inadequate hematological, organ function, or having infection, n (%) | 66 (15.9) |
| Inadequate hematological profile, n (%) | 66 (15.9) |
| ANC of <1000 × 106/L | 13 (3.1) |
| Lymphocyte count of <100 × 106/L | 1 (0.2) |
| Platelet count at <75 000 × 106/L | 18 (4.3) |
| Organ function and infection, n (%) | |
| ALT > 2.5 ULN | 4 (1.0) |
| AST > 2.5 ULN | 2 (0.5) |
| Total bilirubin of >1.5 mg/dL | 4 (1.0) |
| EGFR < 60 mL/min | 23 (5.5) |
| EF < 50% | 1 (0.2) |
| O2 of <92% on room air | 0 (0.0) |
| Active infection | 0 (0.0) |
| Having history or presence of comorbidities,‡ n (%) | 139 (33.4) |
| CNS disorders | 31 (7.5) |
| CNS lymphoma or metastasis | 4 (1.0) |
| Myocardial complications | 81 (19.5) |
| Thrombosis | 51 (12.3) |
| Autoimmune diseases | 30 (7.2) |
| Bridging therapy | |
| Required bridging therapy, n (%) | 165 (39.7) |
| Rituximab-containing regimen | 104 (63.0) |
| RW patients who would not have been eligible and studied in ZUMA-1, n (%) (Patients did not meet eligibility criteria or required bridging therapies) | 239 (57.5) |
ALT, alanine aminotransferase; ANC, absolute neutrophil count; AST, aspartate aminotransferase; BMI, body mass index; CNS, central nervous system; EF, ejection fraction; EGFR, estimated glomerular filtration rate; LD, lymphodepleting; PMBCL, primary mediastinal large B-cell lymphoma; TFL, transformed follicular lymphoma; v2v, vein-to-vein.
History of allogeneic stem cell transplants was part of the exclusion criteria described in ZUMA-1.
Of 94 patients, 87 received axi-cel >3 days after the completion of lymphodepleting therapy; the remaining 7 patients received axi-cel within 1 and 2 days after the completion of lymphodepleting therapy.
Indicates number and percentage of patients with comorbidities who would have been excluded in ZUMA-1. This includes allogeneic stem cell transplantation listed earlier in the table.
Statistical analyses
Descriptive statistics for patients’ baseline demographics, clinical characteristics, and primary and secondary end points are provided. The Kaplan-Meier method was applied to estimate the median and rates of PFS and OS at 18 months. A multivariable Cox proportional hazard model was fitted with clinically relevant covariates selected based on clinical domain knowledge and published literature (Figure 1).24,25,33,40-52 The treatment centers were treated as a random effect covariate, accounting for the variabilities of patient populations and clinical care at each center.
Effectiveness outcomes and associated risk factors. Kaplan-Meier estimates of (A) PFS and (B) OS for 416 patients overlaid with survival results extracted from ZUMA-1 study are shown. The x-axis shows days since administration of axi-cel. (C) Potential risk factors associated with PFSs using a multivariable Cox proportional hazard regression model are shown. Significant risk factors with P values ≤ 0.05 (adjusted for multiple hypothesis testing using the Bonferroni method) are highlighted with pink markers. LFTs, liver function tests; NE, not estimable; NR, not reached.
Effectiveness outcomes and associated risk factors. Kaplan-Meier estimates of (A) PFS and (B) OS for 416 patients overlaid with survival results extracted from ZUMA-1 study are shown. The x-axis shows days since administration of axi-cel. (C) Potential risk factors associated with PFSs using a multivariable Cox proportional hazard regression model are shown. Significant risk factors with P values ≤ 0.05 (adjusted for multiple hypothesis testing using the Bonferroni method) are highlighted with pink markers. LFTs, liver function tests; NE, not estimable; NR, not reached.
For toxicity comparisons, rates of severe adverse events (AEs) were compared between the UCHDW cohort and ZUMA-1 study using Fisher exact tests. P values ≤ 0.05 were considered statistically significant. To adjust for multiple hypothesis testing across several toxicities, Bonferroni correction was applied. Serum biomarkers associated with severe CRS and ICANS were determined via nonparametric Mann-Whitney U tests comparing patients with and without severe CRS or ICANS using laboratory test results within 1 day after infusion (supplemental Table 3).
Decision tree ML model
A total of 387 patients receiving axi-cel at least 6 months before the data extraction date were included to develop the ML model. Early relapse occurred in 164 patients (42.4%). The model was trained to classify patients at higher vs lower risk of early relapse. Demographics, comorbidities, and laboratory test results before, and within 24 hours after, infusion were included as potential features (covariates). Covariates of laboratory tests with missing data were excluded. A total of 32 covariates were initially selected based on domain knowledge and their statistically significant differences between cohorts (supplemental Table 3). Subsequently, we performed feature selection using a Lasso regression to identify key features,53 followed by a classification and regression tree (CART) model to classify early relapse risks. The optimal set of hyperparameters for both Lasso and CART were identified using a grid-search method combined with leave-one-group-out crossvalidation on the training data set (85% of the original data set). The CART model learned optimal cutoff points for each key feature as the decision rules in a tree (Figure 4A). The remaining data (15%) were divided into a validation set (50%) and an out-of-sample (OOS) test set (50%). The validation set was used for model calibration with the isotonic regression, ensuring better alignment between predicted probabilities and observed outcomes. The OOS test set was used to evaluate model’s performance with the Brier score, calibration plot (supplemental Figure 5), and AUROC. The Brier score quantifies the alignment between the predicted probabilities and the observed outcomes, with lower scores indicate better alignment and more accurate probabilistic predictions. Covariates’ contributions to model’s predictions were quantified as feature importance (Figure 4C). The model’s clinical utility and clinical net benefit was quantified using the decision curve analysis (Figure 4F-G).
The computational pipeline was developed using Python 3.7.10 with scikit-learn 0.24.0, lifeline 0.26.4, statkit 1.0.0, and is made publicly available on GitHub.54
Results
Patient characteristics and eligibility assessment
A total of 416 patients who had received axi-cel were included in this study. The median patient age was 63 years (interquartile range [IQR], 54-70), with 46.1% aged >65 years (Table 1). All patients were evaluated based on ZUMA-1 eligibility criteria for adequacy of hematological profile, organ function, and comorbidities at leukapheresis, and the requirement of bridging therapies.3 Overall, 239 (57.5%) patients were characterized by aggressive DLBCL requiring bridging therapies (165/416 [39.7%]), inadequate hematological and organ function (51/416 [12.3%]), and comorbidities (139/416 [33.4%]) previously not studied in ZUMA-1 (Table 1).
The median duration between leukapheresis and axi-cel infusion was 28 days (IQR, 26-34). In the standard protocol, lymphodepleting therapy is given on days −5, −4, and −3 from the day of axi-cel infusion. All patients received lymphodepleting therapy with fludarabine and cyclophosphamide. The median interval between the start of lymphodepleting therapy and axi-cel infusion was 5 days. Overall, 94 (22.6%) patients were unable to receive axi-cel 3 days after the lymphodepleting therapy; 16 (4.8%) received >3 days of lymphodepleting therapy.
Effectiveness outcomes and risk factors
The median follow-up was 687 days (22.9 months), median PFS was 302 days (10.1 months), and the PFS rate was 41.6% (95% confidence interval [CI], 36.6-47.3) at 18 months (Figure 1A). The median OS was 1632 days (54.4 months) with an OS rate of 65.5% (95% CI, 60.0-70.5) at 18 months (Figure 1B). The Cox model showed higher levels of lactate dehydrogenase (HR, 1.31; 95% CI, 1.22-1.41), ferritin (HR, 1.20; 95% CI, 1.13-1.28), and circulating monocytes (HR, 1.11; 95% CI, 1.01-1.21) being significantly associated with worse PFS. Conversely, higher levels of circulating eosinophils (HR, 0.86; 95% CI, 0.77-0.96), hemoglobin (HR, 0.86; 95% CI, 0.80-0.94), and platelets (HR, 0.84; 95% CI, 0.75-0.95) were associated with a longer time to disease progression (Figure 1C).
Toxicity profile and biomarkers associated with severe CRS and ICANS
CRS occurred in 380 of 416 (91.3%) patients (Table 2). Most cases were grade 1 or 2 (72.6%) but 78 (18.8%) patients developed severe CRS, which is higher than the rate reported in ZUMA-1 (P value = .06) but fall within the range of 3% to 27% reported by other RW studies.55,56 Of the cases that experienced CRS, 318 (76.4%) patients received tocilizumab and the median time to the first tocilizumab treatment was 3 days (IQR, 2-5) after infusion. ICANS occurred in 312 of 416 (75.0%) patients, of whom 135 (32.5%) had grade ≥3 severity. All patients who developed ICANS received dexamethasone (n = 264) or other glucocorticoids (n = 96). The median time to first dexamethasone treatment was 5 days (IQR, 3.75-6) after infusion. Anakinra, an interleukin-1 antagonist, was administered to 41 patients, all of whom had concurrent CRS and ICANS. Of these, 23 experienced grade <3 CRS and 18 had grade ≥3 CRS, whereas 9 developed grade <3 ICANS and 32 had grade ≥3 ICANS. The median length of hospitalization was 12 days (IQR, 10-17) with a total of 90 (21.6%) patients receiving transfusions during hospitalization. We found that 1 patient (1/416 [0.2%]) developed secondary T-cell lymphoma with a new diagnosis for peripheral T-cell lymphoma at 28 days after infusion.
Rates of adverse drug events and severity during 30 days after axi-cel infusion in RW patients compared with ZUMA-1
| AE and severity . | UC∗ (N = 416) . | ZUMA-1∗,† (N = 108) . | UC vs ZUMA-1¶ . | ||||
|---|---|---|---|---|---|---|---|
| Any grade (%) . | Grade 1/2 (%) . | Grade 3/4 (%) . | Any grade (%) . | Grade 1/2 (%) . | Grade 3/4 (%) . | P value . | |
| Hematological/organ function toxicity, n (%) | |||||||
| Any | 415 (99.8) | 396 (95.2) | 391 (94.0) | 108 (100.0) | 2 (1.9) | 97 (89.8) | .0801 |
| Hematological toxicity, n (%) | |||||||
| Anemia¶ | 85 (20.4) | 10 (2.4) | 75 (18.0) | 73 (67.6) | 24 (22.2) | 49 (45.4) | .00001 |
| Leukopenia | 94 (22.6) | 2 (0.5) | 92 (22.1) | 20 (18.5) | 2 (1.9) | 18 (16.7) | .2154 |
| Lymphopenia¶ | 2 (0.5) | 0 (0) | 2 (0.5) | 10 (9.3) | 2 (1.9) | 8 (7.4) | .00001 |
| Neutropenia | 223 (53.6) | 4 (1.0) | 219 (52.6) | 48 (44.4) | 6 (6.0) | 42 (38.9) | .0108 |
| Thrombocytopenia | 208 (50.0) | 55 (13.2) | 153 (36.8) | 38 (35.2) | 12 (11.1) | 26 (24.1) | .0131 |
| Organ function toxicity, n (%) | |||||||
| High AST | 109 (26.2) | 82 (19.7) | 27 (6.5) | 19 (17.6) | 12 (11.1) | 7 (6.5) | .9973 |
| High ALT | 144 (34.6) | 104 (25.0) | 31 (7.5) | 22 (20.4) | 16 (14.8) | 6 (5.6) | .4931 |
| Hypogammaglobulinemia | 263 (63.2) | 263 (63.2) | 0 (0) | 16 (14.8) | 16 (14.8) | 0 (0.0) | 1.0000 |
| Hyperbilirubinemia | 67 (16.1) | 66 (15.9) | 1 (0.2) | 5 (4.6) | 4 (3.7) | 1 (0.9) | .3033 |
| Electrolyte/metabolic toxicity, n (%) | |||||||
| Hypophosphatemia | 175 (42.1) | 52 (12.5) | 123 (29.6) | 31 (28.7) | 11 (10.2) | 20 (18.5) | .0216 |
| Hyponatremia | 235 (56.5) | 146 (35.1) | 89 (21.4) | 38 (35.2) | 26 (24.1) | 12 (11.1) | .0158 |
| Hypokalemia | 135 (32.5) | 94 (22.6) | 41 (9.9) | 36 (33.3) | 33 (30.6) | 3 (2.8) | .0181 |
| Hypocalcemia | 188 (45.2) | 152 (36.5) | 36 (8.7) | 43 (39.8) | 36 (33.3) | 7 (6.5) | .4636 |
| Hypoalbuminemia | 97 (23.3) | 87 (20.9) | 10 (2.4) | 43 (39.8) | 42 (38.9) | 1 (0.9) | .3398 |
| Hyperuricemia | 5 (1.2) | 0 (0) | 5 (1.2) | 2 (1.9) | 2 (1.9) | 0 (0.0) | .5887 |
| Immunological and neurological toxicity, n (%) | |||||||
| CRS | 380 (91.3) | 302 (72.6) | 78 (18.8) | 100 (92.6) | 88 (81.5) | 12 (11.1) | .0595 |
| Tocilizumab exposure‡ | 318 (76.4) | 259 (62.3) | 59 (14.2) | — | — | — | — |
| Anakinra exposure‡ | 41 (9.9) | 23 (5.5) | 18 (4.3) | — | — | — | — |
| Time to first tocilizumab, median (range), d | 3 (2-5) | 4 (2-5) | 2 (1-5) | — | — | — | — |
| ICANS | 312 (75.0) | 177 (42.5) | 135 (32.5) | 72 (66.7) | 37 (34.3) | 35 (32.4) | .9930 |
| Dexamethasone exposure§ | 264 (63.5) | 137 (32.9) | 127 (30.5) | — | — | — | — |
| Other glucocorticosteroid exposure§ | 96 (23.1) | 31 (7.5) | 65 (15.6) | — | — | — | — |
| Anakinra exposure§ | 41 (13.1) | 9 (5.1) | 32 (23.7) | — | — | — | — |
| Time to first dexamethasone (day), median (range) | 5 (3.75-6) | 5 (3-6) | 5 (4-6) | — | — | — | — |
| Other toxicity treatments and measures, n (%) | — | — | — | — | |||
| Filgrastim-sndz | 153 (36.8) | — | — | — | — | — | — |
| Immunoglobulin G | 41 (9.9) | — | — | — | — | — | — |
| Tumor lysis syndrome | 6 (1.4) | — | — | — | — | — | — |
| Infections¶ | 79 (19.0) | — | — | 8 (8) | — | — | .0039 |
| Hospitalization | |||||||
| Length of stay, median (range) | 12 (10-17) | — | — | — | — | — | — |
| Received transfusion | 90 (21.6) | ||||||
| Death¶ | 5 (1.2) | — | — | 9 (8) | — | — | .0001 |
| AE and severity . | UC∗ (N = 416) . | ZUMA-1∗,† (N = 108) . | UC vs ZUMA-1¶ . | ||||
|---|---|---|---|---|---|---|---|
| Any grade (%) . | Grade 1/2 (%) . | Grade 3/4 (%) . | Any grade (%) . | Grade 1/2 (%) . | Grade 3/4 (%) . | P value . | |
| Hematological/organ function toxicity, n (%) | |||||||
| Any | 415 (99.8) | 396 (95.2) | 391 (94.0) | 108 (100.0) | 2 (1.9) | 97 (89.8) | .0801 |
| Hematological toxicity, n (%) | |||||||
| Anemia¶ | 85 (20.4) | 10 (2.4) | 75 (18.0) | 73 (67.6) | 24 (22.2) | 49 (45.4) | .00001 |
| Leukopenia | 94 (22.6) | 2 (0.5) | 92 (22.1) | 20 (18.5) | 2 (1.9) | 18 (16.7) | .2154 |
| Lymphopenia¶ | 2 (0.5) | 0 (0) | 2 (0.5) | 10 (9.3) | 2 (1.9) | 8 (7.4) | .00001 |
| Neutropenia | 223 (53.6) | 4 (1.0) | 219 (52.6) | 48 (44.4) | 6 (6.0) | 42 (38.9) | .0108 |
| Thrombocytopenia | 208 (50.0) | 55 (13.2) | 153 (36.8) | 38 (35.2) | 12 (11.1) | 26 (24.1) | .0131 |
| Organ function toxicity, n (%) | |||||||
| High AST | 109 (26.2) | 82 (19.7) | 27 (6.5) | 19 (17.6) | 12 (11.1) | 7 (6.5) | .9973 |
| High ALT | 144 (34.6) | 104 (25.0) | 31 (7.5) | 22 (20.4) | 16 (14.8) | 6 (5.6) | .4931 |
| Hypogammaglobulinemia | 263 (63.2) | 263 (63.2) | 0 (0) | 16 (14.8) | 16 (14.8) | 0 (0.0) | 1.0000 |
| Hyperbilirubinemia | 67 (16.1) | 66 (15.9) | 1 (0.2) | 5 (4.6) | 4 (3.7) | 1 (0.9) | .3033 |
| Electrolyte/metabolic toxicity, n (%) | |||||||
| Hypophosphatemia | 175 (42.1) | 52 (12.5) | 123 (29.6) | 31 (28.7) | 11 (10.2) | 20 (18.5) | .0216 |
| Hyponatremia | 235 (56.5) | 146 (35.1) | 89 (21.4) | 38 (35.2) | 26 (24.1) | 12 (11.1) | .0158 |
| Hypokalemia | 135 (32.5) | 94 (22.6) | 41 (9.9) | 36 (33.3) | 33 (30.6) | 3 (2.8) | .0181 |
| Hypocalcemia | 188 (45.2) | 152 (36.5) | 36 (8.7) | 43 (39.8) | 36 (33.3) | 7 (6.5) | .4636 |
| Hypoalbuminemia | 97 (23.3) | 87 (20.9) | 10 (2.4) | 43 (39.8) | 42 (38.9) | 1 (0.9) | .3398 |
| Hyperuricemia | 5 (1.2) | 0 (0) | 5 (1.2) | 2 (1.9) | 2 (1.9) | 0 (0.0) | .5887 |
| Immunological and neurological toxicity, n (%) | |||||||
| CRS | 380 (91.3) | 302 (72.6) | 78 (18.8) | 100 (92.6) | 88 (81.5) | 12 (11.1) | .0595 |
| Tocilizumab exposure‡ | 318 (76.4) | 259 (62.3) | 59 (14.2) | — | — | — | — |
| Anakinra exposure‡ | 41 (9.9) | 23 (5.5) | 18 (4.3) | — | — | — | — |
| Time to first tocilizumab, median (range), d | 3 (2-5) | 4 (2-5) | 2 (1-5) | — | — | — | — |
| ICANS | 312 (75.0) | 177 (42.5) | 135 (32.5) | 72 (66.7) | 37 (34.3) | 35 (32.4) | .9930 |
| Dexamethasone exposure§ | 264 (63.5) | 137 (32.9) | 127 (30.5) | — | — | — | — |
| Other glucocorticosteroid exposure§ | 96 (23.1) | 31 (7.5) | 65 (15.6) | — | — | — | — |
| Anakinra exposure§ | 41 (13.1) | 9 (5.1) | 32 (23.7) | — | — | — | — |
| Time to first dexamethasone (day), median (range) | 5 (3.75-6) | 5 (3-6) | 5 (4-6) | — | — | — | — |
| Other toxicity treatments and measures, n (%) | — | — | — | — | |||
| Filgrastim-sndz | 153 (36.8) | — | — | — | — | — | — |
| Immunoglobulin G | 41 (9.9) | — | — | — | — | — | — |
| Tumor lysis syndrome | 6 (1.4) | — | — | — | — | — | — |
| Infections¶ | 79 (19.0) | — | — | 8 (8) | — | — | .0039 |
| Hospitalization | |||||||
| Length of stay, median (range) | 12 (10-17) | — | — | — | — | — | — |
| Received transfusion | 90 (21.6) | ||||||
| Death¶ | 5 (1.2) | — | — | 9 (8) | — | — | .0001 |
ALT, alanine aminotransferase; AST, aspartate aminotransferase.
Em dash (−) indicates values not applicable or not assessed for that column or cohort.
All values in ZUMA-1 were extracted from the associated publications with a total of 108 trial participants. The missing values in ZUMA-1 columns were not reported.
Unadjusted P values of Fisher exact tests comparing rates of severe (grade ≥3) AE observed in UC and ZUMA-1 for all AEs except hypogammaglobulinemia for which total rates were compared.
Of 416 patients, 380 experienced CRS. The percentages for the exposures of tocilizumab and anakinra were calculated based on these 380 patients.
Of 416 patients, 342 experienced ICANS. The percentages for the exposures to dexamethasone, other glucocorticoids, and anakinra were calculated based on these 342 patients.
Severe AEs with rates observed in UC and ZUMA-1 remained significantly different after correction for multiple-hypothesis testing using the Bonferroni method.
Nearly all patients (415 [99.8%]) developed at least 1 AE within 30 days after infusion. The most prevalent severe AEs were hypogammaglobulinemia (63.2%), neutropenia (52.6%), and thrombocytopenia (36.8%), followed by electrolyte imbalances including hypophosphatemia (29.6%) and hyponatremia (21.4%). Of patients who developed hypogammaglobulinemia, 44 of 263 (16.7%) received IV immunoglobulin replacement therapy. Hepatoxicity was observed with elevated aspartate, alanine transaminase, and hypoalbuminemia in 26.2%, 34.6%, and 23.3% of patients, respectively (Table 2). The median durations of severe anemia, thrombocytopenia, and leukopenia were 154, 52, and 20 days, respectively (Figure 2B). The median duration for severe electrolyte abnormalities, including hypokalemia, hyponatremia, and hypocalcemia, were short, ranging between 1 and 7 days (Figure 2B). In contrast to the severe toxicities, the median durations for mild hypogammaglobulinemia, leukopenia, thrombocytopenia, neutropenia, and anemia lasted for 68, 41, 30, 28, and 21 days, respectively (Figure 2C). The median durations for mild electrolyte abnormalities ranged from 1 to 4 days. Infections, tumor lysis syndrome, and deaths were observed in 79 (19.0%), 6 (1.4%), and 5 (1.2%) patients, respectively (Table 2; supplemental Table 4). The AE burden for each patient highlighted that most patients experienced combinations of AEs concurrently (Figure 2A).
Adverse drug event burden characterization and serum biomarkers. (A) AEs including hematological toxicity, organ function toxicity, other laboratory test abnormalities, CRS, and neurotoxicity during the 30 days after axi-cel infusion are shown (rows) for 416 individual patients (columns), with bars indicating the number of events and colors representing severity. The median durations of (B) milder (grade <3) and (C) severe (grade ≥3) AEs, with the color shades indicating the rates (%) of RW patients developing such AEs.
Adverse drug event burden characterization and serum biomarkers. (A) AEs including hematological toxicity, organ function toxicity, other laboratory test abnormalities, CRS, and neurotoxicity during the 30 days after axi-cel infusion are shown (rows) for 416 individual patients (columns), with bars indicating the number of events and colors representing severity. The median durations of (B) milder (grade <3) and (C) severe (grade ≥3) AEs, with the color shades indicating the rates (%) of RW patients developing such AEs.
The biomarker analysis found lower postinfusion levels of albumin, and circulating lymphocytes, neutrophils, eosinophils, monocytes, and platelets, but high levels of lactate dehydrogenase (LDH) and ferritin were significantly associated with severe CRS and ICANS (Figure 3A-B). Additionally, lower level of hemoglobin and higher level of C-reactive protein (CRP) were significantly associated with severe ICANS (Figure 3B). Higher levels of CRP were also found associated with severe CRS but did not reach statistical significance (Figure 3A).
Potential serum biomarkers associated with severe CRS or neurotoxicity. A total of 10 serum biomarkers potentially associated with severe (A) CRS or (B) ICANS with significantly different levels in patients with mild (grade <3) vs severe (grade ≥3) toxicities are shown. ∗P > 1 × 10−2, P ≤ 5 × 10−2; ∗∗P > 1 × 10−3, P ≤ 1 × 10−2; ∗∗∗P > 1. × 10−4, P ≤ 1 × 10−3; ∗∗∗∗P ≤ 1 × 10−4.
Potential serum biomarkers associated with severe CRS or neurotoxicity. A total of 10 serum biomarkers potentially associated with severe (A) CRS or (B) ICANS with significantly different levels in patients with mild (grade <3) vs severe (grade ≥3) toxicities are shown. ∗P > 1 × 10−2, P ≤ 5 × 10−2; ∗∗P > 1 × 10−3, P ≤ 1 × 10−2; ∗∗∗P > 1. × 10−4, P ≤ 1 × 10−3; ∗∗∗∗P ≤ 1 × 10−4.
Decision tree ML model and early relapse prediction
The CART ML model achieved a high AUROC score of 0.87 (standard deviation, 0.05) in the training data set and 0.82 (95% CI, 0.72-0.97) in the OOS testing data set in classifying patients who will develop an early relapse within 6 months after having received axi-cel (Figure 4B). Key contributing features (covariates) for the decision tree ML model were age and levels of LDH, ferritin, CRP, hematocrits, platelets, and prothrombin time measured within 24 hours after infusion (Figure 4C). The calibration plot showed that the model’s predicted probabilities of patient’s risk of early relapse were overall aligned with the observed probabilities (supplemental Figure 5). However, the model tended to slightly overestimate the risk during the probability range of 0.3 to 0.5. After calibration, the Brier score, having decreased from 0.19 to 0.16, reflected better calibration and prediction accuracy. The decision tree of our CART ML model is displayed in Figure 4A. The ML-predicted subgroups, group A (patients who remained free of progression) and group B (patients who experienced early relapse), demonstrated significantly different PFS curves (log-rank test, P < .05; Figure 4D) but did not reach statistically significant difference in OS (Figure 4E). The decision curve analysis (Figure 4F-G) demonstrated that the model provided a positive net benefit compared with “treat-all-patients” and “treat-none” strategies across a broad range of decision thresholds (0-0.7).
Decision tree ML model predicts early relapse within 6 months after axi-cel using age and 6 first laboratory values within 24 hours immediately after axi-cel infusion. (A) The decision tree ML model developed to identify patients with higher risks of experiencing early relapse after receiving axi-cel. (B) The ROC curves for each leave-1-group-out validation and OOS testing data set. (C) The 7 patient-level predictors important to the decision tree ML model’s predictions. Kaplan-Meier estimates of (D) PFS and (E) OS of groups A and B predicted by the ML model in OOS testing data set. Groups A and B represent subgroups of patients without or with risk of experiencing relapse within 6 months after receiving axi-cel, respectively. The x-axis shows days since administration of axi-cel. (F) Decision curve analysis of the ML model across all threshold probabilities and (G) between 0 and 0.4. AUC, area under the curve; CV, crossvalidation; PT, prothrombin time; ROC, receiver operating characteristic.
Decision tree ML model predicts early relapse within 6 months after axi-cel using age and 6 first laboratory values within 24 hours immediately after axi-cel infusion. (A) The decision tree ML model developed to identify patients with higher risks of experiencing early relapse after receiving axi-cel. (B) The ROC curves for each leave-1-group-out validation and OOS testing data set. (C) The 7 patient-level predictors important to the decision tree ML model’s predictions. Kaplan-Meier estimates of (D) PFS and (E) OS of groups A and B predicted by the ML model in OOS testing data set. Groups A and B represent subgroups of patients without or with risk of experiencing relapse within 6 months after receiving axi-cel, respectively. The x-axis shows days since administration of axi-cel. (F) Decision curve analysis of the ML model across all threshold probabilities and (G) between 0 and 0.4. AUC, area under the curve; CV, crossvalidation; PT, prothrombin time; ROC, receiver operating characteristic.
Discussion
A key contribution of this study is the development of a novel, interpretable ML model that uses only age and 6 routine postinfusion laboratory values to identify patients at high risk of early relapse after axi-cel infusion. The model achieved strong predictive performance (AUROC score of 0.87 in crossvalidation, 0.82 in out-of-sample testing) and demonstrated generalizability across 5 medical centers in the UC Health. By stratifying patients into high- and low-risk groups (group A vs group B), the model revealed significantly worse PFS outcomes in the high-risk group, suggesting it may be clinically actionable for early intervention planning. These findings reinforce the potential value of integrating computational tools to improve patient stratification and optimize outcomes in patients receiving axi-cel.
In our retrospective cohort of 416 patients, the median follow-up was 687 days (22.9 months), median PFS was 302 days (10.1 months), and the PFS rate was 41.6% (95% CI, 36.6-47.3) at 18 months (Figure 1A). The median OS was 1632 days (54.4 months), with an OS rate of 65.5% (95% CI, 60.0-70.5) at 18 months (Figure 1B). These results are consistent with previously published outcomes from both clinical trials and RW studies. The ZUMA-1 trial reported a 39.1-month median follow-up with median PFS and OS of 5.9 and 25.8 months, respectively, and 18-month PFS and OS rates of 41% and 52%, respectively; severe CRS and ICANS occurred in 12% and 35% of patients, respectively.3,5 In a larger RW CIBMTR study (Jacobson et al), 57% of patients would have been ineligible for ZUMA-1. With a median follow-up of 12.9 months, the study reported median PFS and OS of 8.6 and 21.8 months, 12-month PFS and OS rates of 47.3% and 62.3%, and 24-month rates of 39.2% and 49.5%, respectively; severe CRS and ICANS were observed in 8% and 24% of patients, respectively.57 Similarly, 57.5% of our cohort would have been ineligible for ZUMA-1, but our cohort had longer median follow-up (22.9 vs 12.9 months) and improved median PFS (10.1 vs 8.6 months) and OS (54.4 vs 21.8 months). Our 18-month PFS (41.6%) and OS (65.5%) rates fell within the CIBMTR study’s 12- to 24-month range.57 These differences may reflect selection bias of lower-risk patients at UC Health during this interval or experience-dependent improvements in outcomes. Further studies are needed to understand how treatment protocols across UC Health may affect survival outcomes.
Our rates of severe CRS (18.8%) and ICANS (32.5%) were comparable with those in ZUMA-1 and other RW studies. Notably, our ICANS detection algorithm was based on NCCN’s neurotoxicity treatment guideline rather than the ASTCT consensus criteria.58 We also confirmed high rates of prolonged hematological toxicities, including thrombocytopenia, anemia, and leukopenia, as reported in other studies,5,31,59 potentially due to a higher proportion of older patients and those with medically complex conditions with aggressive diseases excluded from ZUMA-1. Our findings confirmed the presence of persistent toxicities after axi-cel infusion, particularly severe anemia and thrombocytopenia, with a median duration of 154 and 52 days, respectively, consistent with previous reports.60-62 These results underscore the need for careful monitoring and supportive care, especially in older or medically complex populations. Although broader RW patient groups may benefit from axi-cel, proactive toxicity management remains essential to optimize outcomes. Gurney et al recently reported that older age and lower platelet and hemoglobin may increase the risks of developing myeloid neoplasms after CAR-T therapies.63 Although secondary malignancies were not the primary focus of our study, we identified 1 case (0.2%) of secondary T-cell lymphoma diagnosed 28 days after infusion. This patient had low baseline platelet, hemoglobin, and erythrocyte counts, consistent with potential risk factors. Given the known but rare occurrence of secondary lymphoid neoplasms after CAR-T therapy, this finding, although based solely on structured EHR diagnosis codes without histopathologic confirmation, underscores the importance of continued long-term surveillance and further investigation into patient-level risk factors in broader RW populations.
We explored the association of routine laboratory markers and patient characteristics with clinical outcomes after axi-cel treatment. These analyses aimed to identify accessible biomarkers that may predict both survival and toxicity, and thus help guide postinfusion management. The Cox model revealed that higher LDH, a known poor prognostic factor,64 was associated with shorter time to progression. Patients aged ≥65 years were underrepresented in ZUMA-1 (24%) compared with our cohort (46.1%). In our analysis, age of ≥65 years was associated with longer PFS, and although it did not reach statistical significance, could contribute to achieving comparable survival outcomes in RW populations. Similar trends were previously observed in ZUMA-1 and 2 large RW studies.3,57,65 In contrast, Chihara et al reported significantly shorter event-free survival found in patients aged ≥75 years.30
Elevated ferritin was associated with worse PFS, in addition to its known links with severe CRS and ICANS. Conversely, higher levels of circulating platelets, hemoglobin, and eosinophils were associated with longer PFS. This finding has only been reported in few small RW studies.20 Eosinophils have been suggested as prognostic markers for longer survival outcomes in other immunotherapy context but remain understudied in axi-cel.66 Baseline blood counts may reflect previous lines of treatments received, which is associated with disease severity and refractoriness. We found higher monocytes levels were associated with shorter PFS, consistent with findings that monocyte depletion in leukapheresis may improve outcomes.67
We also evaluated postinfusion laboratory values to identify potential serum biomarkers associated with toxicity. Higher postinfusion levels of proinflammatory biomarker, ferritin, were found associated with severe CRS, as reported previously,68-70 but CRP was only associated with severe ICANS. Furthermore, we found that low postinfusion albumin levels were associated with the development of both severe CRS and ICANS. Lower postinfusion circulating lymphocytes, eosinophils, neutrophils, monocytes, and platelets were also found to be associated with both severe CRS and ICANS. This adds to the evidence of a relationship between postinfusion hematological toxicities, such as thrombocytopenia, and both CRS and ICANS.31,59,71-73 Lower levels of hemoglobin were also found associated with severe ICANS. As previously discussed, we found higher preinfusion levels of circulating platelets and hemoglobin were also associated with a longer PFS, suggesting that baseline hematological profiles are associated with both safety and effectiveness outcomes.
The decision tree ML model achieved a high AUROC score of 0.87 (std: 0.05) during crossvalidation and 0.82 (95% CI, 0.72-0.97) in OOS testing, using only age and 6 routinely measured laboratory tests immediately after infusion. To our knowledge, this is the first predictive ML model, using only routinely measured RW data, that predicts risk of early relapse. Previous studies linked long-term remission outcomes to factors such as treatment response within the first few months, CAR-T expansion, tumor burden and location, and receipt of lymphodepleting therapy.74 Other retrospective studies found age, performance status, levels of LDH, and total metabolic tumor volume (TMTV) were found highly predictive of early relapse.75 However, many of these are labor intensive or require longer time to assess such as treatment response, CAR-T expansion, or total metabolic tumor volume. Our decision tree ML model presents a novel, interpretable approach to early risk stratification. The model demonstrated strong generalizability across UC centers maintaining high AUROC scores during the leave-one-group-out crossvalidation and the evaluation in the OOS data set (Figure 4B). It is worth highlighting that the model relies only on age and 6 routinely measured laboratory values collected 24 hours immediately after infusion. This simplicity enhances its potential generalizability and applicability to diverse populations and clinical settings. The decision curve analysis also demonstrated positive net benefit across the wide range of threshold probability (Figure 4F-G) highlighting its flexibility and utility in different clinical scenarios, in which thresholds for intervention may vary depending on the risk tolerance of the treating physicians, patient preferences, and resource availabilities. Importantly, the model stratified patients into 2 distinct risk groups based on the inferred decision tree cutoff points, with group A (low-risk) demonstrating significantly longer PFS than group B (high-risk; Figure 4D). This underscores the potential utility of the model not only for individualized prediction, but also for population-level stratification that may inform targeted postinfusion monitoring or considerations for preemptive therapies. Although our ML model demonstrated strong performance in both internal validation and external testing across 5 medical centers within the UC Health using routinely collected postinfusion laboratory values, it was developed using retrospective data. As such, its clinical utility should be interpreted with appropriate caution. Prospective validation in an independent cohort will be critical to confirm the model’s predictive accuracy, generalizability, and clinical utility in RW settings. Such studies will be critical for evaluating RW integration, physician adoption, and impact on clinical decision-making, especially in guiding timely interventions for potential adjuvant or salvage therapies after axi-cel to improve survival outcomes in subgroups of patients at highest early relapse risk. These efforts are essential to support the translational potential of this model from retrospective research to an actionable clinical tool.
Limitations
This study is subject to the biases inherent in retrospective observational studies, and any suggestion of causality could be confounded by unmeasurable covariates. Furthermore, the survival analyses could be limited by potential immortal bias in cases in which patients received axi-cel very recently to the data extraction date, possibly resulting in the inability to observe an outcome event as yet. In addition, all analyses were based on structured data collected from EHR instances across the UC Health, centralized and harmonized in the UCHDW. Hence, clinical care elements for patients received outside of UC Health could not be assessed, which could contribute to information bias and loss. Additionally, certain variables necessary to further characterize clinical conditions, such as infection types, details surrounding secondary T-cell lymphoma, or the clinical rationale for anakinra use, reside in unstructured clinical notes and were thus unavailable at the time of this study. Furthermore, patients included in this study were treated at independent UC centers without a centrally designated care protocol or treatment plan. Therefore, site-specific biases in patient selection or care are likely to exist. Detailed information related to lymphoma status, performance status, tumor volume, previous lines of therapies, response to bridging therapies, and treatment response of axi-cel were difficult to ascertain because they largely reside in unstructured clinical text. Additionally, this retrospective study was conducted with patients who received axi-cel, which could potentially overestimate clinical outcomes compared with an intent-to-treat study design.
Our algorithms for CRS and ICANS were developed and validated only using the UCSF-specific subcohort and then applied across the other UC centers (supplemental Figures 1-3). Variations in clinical practice could affect accuracy of ICANS severity grading because the NCCN treatment guideline may not be adopted across centers. Although ASTCT guidelines are the standard for grading CRS and ICANS,58 only our CRS algorithm adhered fully to those criteria. Developing a structured data–based ICANS algorithm was more challenging, because key ASTCT grading elements (eg, immune effector cell-associated encephalopathy scores, neurological assessments) are typically found in unstructured clinical notes, which were not available or harmonized across the UC system at the time of this study. As such, we used the NCCN neurotoxicity guideline to define ICANS events using structured data. Although we refer to these events as “ICANS” to align with terminology used in previous studies, they should be interpreted as “ICANS-like” events identified through a proxy approach. This constitutes an important limitation when comparing neurotoxicity rates with those of other studies using full ASTCT-based grading. Although our study population is not as large the CIBMTR data, our data are enriched with detailed and routinely measured laboratory test results, which collectively represent patients’ physiological changes, toxicities, response to treatments, and health status. These readily available EHR data, if confirmed to have prognostic value, could be used to identify patients who may benefit from additional monitoring, adjuvant, maintenance, or early salvage therapy. Despite these limitations, the potential generalizability and high performance across UC centers suggest significant promise of using ML modeling to stratify clinical risk of meaningful adverse outcomes and relapse. These findings justify the initiation of additional prospective confirmatory studies in larger and more diverse patient populations to further validate the robustness and clinical utility of the ML model.
Conclusion
Relying solely on computational approaches, our assessments of RW performance of axi-cel in a multicenter cohort across the UC Health mirrored many key findings reported in other RW studies. This study proposed a novel approach to enhance clinical care for patients receiving axi-cel. Our decision tree ML model (AUROC score of 0.82) demonstrates clinical utility by identifying patients at risk for early relapse using only age and 6 routine postinfusion laboratory tests. With prospective confirmatory studies, this ML model has the potential to guide clinical decision-making and enhance patient outcomes.
Acknowledgments
The authors thank the staff of the Bakar Computational Health Sciences Institute, University of California San Francisco (UCSF) Information Commons, and the Center for Data-driven Insights and Innovation within the University of California Health System. The authors thank the UCSF Clinical and Translational Science Institute (grant UL1TR001872) for their help retrieving clinical notes at UCSF for chart review and validation.
M.W. was supported by the US National Institutes of Health (NIH) T32 training grant (5T32GM007175-43). Partial grant support was provided through the US Food and Drug Administration (FDA) U01FD005978 to the UCSF Stanford Center of Excellence in Regulatory Sciences and Innovation, the UCSF Clinical and Translational Science Institute (UL1TR001872), and the NIH National Institute of General Medical Sciences (5T32GM007175-43).
The content of this study reflects the views of the authors and should not be construed to represent views or policies of the NIH or FDA.
The authors dedicate this article to Atul J. Butte, as an inspiring physician, collaborator, mentor, and friend. Atul J. Butte, a pioneering physician-scientist and leader in health informatics and data science, died before the submission of this revised manuscript.
Authorship
Contribution: M.W., K.V.K., and A.J.B. contributed to the conception and design; M.W., D.D., P.C., R.V., B.R., A.J.B., and K.V.K. contributed to data analysis and interpretation; M.W., K.V.K., and A.J.B. drafted the manuscript; and all authors contributed to the critical review of the manuscript and approved the final manuscript.
Conflict-of-interest disclosure: A.J.B. was a cofounder and consultant to Personalis and NuMedii; was a consultant or advisor to the National Institutes of Health (NIH), Journal of the American Medical Association, Mango Tree Corporation, Samsung, Geisinger Health, Washington University in Saint Louis, University of Utah, and, in the recent past, 10x Genomics, Helix, Pathway Genomics, and Verinata (Illumina); served on paid advisory panels or boards for Regenstrief Institute, Gerson Lehman Group, AlphaSights, Covance, Novartis, Genentech, Merck, and Roche; was a shareholder in Personalis and NuMedii; was a minor shareholder in Apple, Meta (Facebook), Alphabet (Google), Microsoft, Amazon, Nvidia, Advanced Micro Devices, Snap, 10x Genomics, Doximity, Regeneron, Sanofi, Pfizer, Royalty Pharma, Moderna, BioNtech, Invitae, Pacific Biosciences, Editas Medicine, Eli Lilly, Nuna Health, Assay Depot (Scientist.com), Vet24seven, Snowflake, Sophia Genetics, and several other nonhealth-related companies and mutual funds; received honoraria and travel reimbursement for invited talks from Johnson and Johnson, Roche, Genentech, Pfizer, Merck, Lilly, Takeda, Varian, Mars, Siemens, Optum, Abbott, Celgene, AstraZeneca, AbbVie, Westat, Applied Research Works, Acentrus, Analytical, Life Science & Diagnostics Association, and many academic institutions, medical or disease-specific foundations and associations, and health systems; received royalty payments through Stanford University, for several patents and other disclosures licensed to NuMedii and Personalis; and received research funding from NIH, US Food and Drug Administration (FDA), Peraton (as the prime investigator on an NIH contract), Priscilla Chan and Mark Zuckerberg, the Barbara and Gerson Bakar Foundation, Genentech, Johnson and Johnson, Chan Zuckerberg Science, Robert Wood Johnson Foundation, Leon Lowenstein Foundation, Intervalien Foundation, the March of Dimes, Juvenile Diabetes Research Foundation, California Governor’s Office of Planning and Research, California Institute for Regenerative Medicine, L’Oreal, and Progenity. K.V.K. has served as an ad hoc consultant to Janssen, Genentech/Roche, Incyte, Bristol Myers Squibb, Cargo Therapeutics, CRISPR, Sanofi, Optum Health, Celgene, Avacta Therapeutics, Aegle Therapeutics, and CellChorus. M.W. reports research funding from FDA, Amgen, Merck, and BeiGene. None of the aforementioned entities had any role in the design, planning, or execution of the study, or interpretation of the findings. The remaining authors declare no competing financial interests.
Atul J. Butte died on 13 June 2025.
Correspondence: Krishna V. Komanduri, Division of Hematology and Oncology, Department of Medicine, University of California San Francisco, 400 Parnassus Ave, #A453, San Francisco, CA 94143; email: krishna.komanduri@ucsf.edu.
References
Author notes
Aggregated, deidentified data are available on request from the author, Michelle Wang (michelle.wang2@ucsf.edu).
The full-text version of this article contains a data supplement.





