Background

Clinical trial data are a critical, valuable, difficult to access source of information about new medicines. Privacy & intellectual property concerns limit the availability of trial data for secondary, exploratory analysis. Generative models trained on historical clinical trial data can produce synthetic datasets that preserve patient privacy while maintaining dataset characteristics. This allows analysts access to clinical insights without compromising privacy. Uses of these data include to design clinical trials, to generate evidence to support subgroup analyses or matched external controls and to augment trial data sources for machine learning

We report how synthetic data from CAR T trials is used to model safety & efficacy in a trial setting with special focus on analyses to design a mitigation strategy for prolonged leukopenia following CAR T infusion

Methods

Synthetic Data Generation

Synthetic data was generated using Simulants, an algorithm that creates synthetic patients by permuting features among similar patients. This approach significantly outperforms deep-learning approaches for clinical trial data. Patient-level data was synthesized from multiple, completed clinical trials from the Medidata Clinical Cloud

Clinically relevant analysis planning

In collaboration with academic & industry experts, we translated analytical insights into practical advancements in treatment safety and efficacy. Applications include designing lymphodepletion strategies to optimize CAR-T therapy efficacy, predicting severe Cytokine Release Syndrome and analyzing co-occurrence patterns of CRS and ICANS

We focused on prolonged leukopenia and immune-system recovery following CAR T therapy because this is a common side effect associated with life threatening infections that is difficult to study without access to the patient-level data. While information on adverse events, such as infections, or point-in-time reports of abnormal blood cytology may be reported in published trial reports as graded, adverse events, blood test dynamics are notAnalytical approaches

Analysis methods included

  • Descriptive analyses on temporal evolution of blood analytes over time

  • Risk factor analysis using models to find factors predictive of prolonged leukopenia or that impact leukocyte recovery

  • Intervention modeling through analysis of treatment approaches observed in the trial data, such as use of Granulocyte-Colony Stimulating Factor

ResultsSynthetic Data Fidelity and Privacy

Synthetic data demonstrated high fidelity to datasets (Silhouette score: -0.083, Bag of words R²: 0.99), ensuring the reliability of downstream analyses while protecting individual privacy (Membership disclosure AUC ROC: 0.62).Clinically meaningful insights

Distinct patterns in leukocyte dynamics were observed in patients who exhibited prolonged leukopenia as compared to those who recovered quickly. While leukocyte counts initially dropped sharply in all recovery groups post-infusion, patients who recovered showed a consistent increase in leukocytes, while patients who did not recover fluctuated below the leukopenia threshold. Patients with partial recovery plateaued around Day 50 with minimal late recovery

Elevated ferritin post-infusion and pre-infusion leukocyte counts were identified as significant predictors of prolonged leukopenia

Early Granulocyte-Colony Stimulating Factor (G-CSF) administration within the first 30 days of CAR T therapy was associated with less long-term leukopenia, highlighting the potential benefit of early interventionCross validation of findings in synthetic data with source data

The same analysis was conducted in parallel on the real data. We observed qualitatively similar results, which reinforces the reliability of the synthetic data for analysis. Further, no statistically significant differences were observed in any key findings between the source & synthetic data.

Conclusion

Synthetic clinical trial data can be used in place of source data to develop clinically meaningful insights for CAR T patient management. As synthetic data carries many fewer risks for the privacy of patients or sponsors, we demonstrate the feasibility of this approach to significantly enhance the availability of clinical trial data & accelerate the discovery of new therapeutic approaches

Disclosures

Lafeuille:Medidata, a Dassault Systèmes company: Current Employment. Shafquat:Medidata, a Dassault Systèmes company: Current Employment. Sang:Medidata, a Dassault Systèmes company: Current Employment. Beigi:Medidata, a Dassault Systèmes company: Current Employment. Maura:Sanofi: Consultancy, Honoraria; Medidata: Consultancy, Honoraria. Aptekar:Medidata, a Dassault Systèmes company: Current Employment.

This content is only available as a PDF.
Sign in via your Institution