Background. The emergence of Generative AI is expanding the use of Synthetic Data (SD) for ground-breaking applications, such as digital twins for evidence generation and synthetic control arms. However, their adoption is limited by technical barriers and unclear regulatory validation process, particularly due to the absence of robust tools and standardized approaches to assess its clinical applicability. This is particularly evident in hematology, where leveraging on large-scale, multimodal data is essential to develop personalized treatments and address unmet needs in rare diseases like Myeloid Neoplasms (MN). This study presents SAFE (Synthetic vAlidation FramEwork), a comprehensive framework for evaluating multimodal SD based on statistical fidelity, clinical utility and privacy preservability, validated in the MN clinical setting.

Methods. We applied SAFE on SD generated from the extensive TITAN cohort (n=20,054), a retrospective multimodal dataset comprising 7104 AML, 8410 MDS, 2986 MDS/MPN and 1554 MF cases, including clinical, genomic, transcriptomic (bulk RNA-seq) and histopathological images data. SAFE was developed within the SYNTHEMA and SYNTHIA consortia as a modular, extensible, Python-based solution comprising three main analysis modules tailored for distinct data modalities: safe.tabular, safe.series (for longitudinal data) and safe.images. Each module evaluates statistical fidelity through different metrics specific for each clinical modality and data type, ensuring privacy preservability by preventing any link with real patients and their replication. SAFE introduces an innovative synthetic RNA-seq validation pipeline, specifically designed to tackle the biological complexity of transcriptomic data. By integrating a clinically-driven layer across multiple data modalities, SAFE provides disease-specific and interpretable insights on SD usability. In the MN setting, this was demonstrated by using SD for disease classification and personalized prognostic evaluation through the MOSAIC framework (PMID: 38875514).

Results. We generated a multimodal synthetic cohort (n=20,054) using a TRAIN SD platform (www.train-ai.eu), accurately mirroring real dataset's disease stratification. We applied safe.tabular framework, summarizing validation performance on each modality through key metrics: Clinical Synthetic Fidelity (CSF), Genomic Synthetic Fidelity (GSF), Clinical Synthetic Utility (CSU), Transcriptomics Synthetic Fidelity (TSF) and Privacy Synthetic Score (PSS). These were combined into an overall SAFE score, using optimal thresholds between 85–95% to balance data accuracy and privacy. The analysis revealed high concordance for clinical feature distributions and correlations (CSF: 91%), as well as for genomic alterations and pairwise gene associations (GSF: 88%). Clinical utility was evaluated using the MOSAIC framework, demonstrating that synthetic patients had comparable outcomes to real patients in unsupervised patient stratification, prognostic scoring and survival analysis (log-rank p-value=0.8) and when applying conventional scoring systems (CSU: 90.2%). Synthetic RNA-seq quality and its biological fidelity were validated by transcriptomic profiles distribution, differential expression and enrichment analyses, with Jaccard, Dice and Spearman correlation metrics, confirming that synthetic expression accurately reflects functional alterations associated with clinical conditions (TSF: 88%). Privacy was assessed across modalities via Distance to Closest Record and Nearest Neighbor Distance Ratio, confirming a low re-identification risk (PSS: 86%). For digital pathology slides, safe.images achieved a Fréchet Inception Distance of 8.3 and Multi-Scale Structural Similarity Index values ranging from 0.028 to 0.216, indicating good realism and intra-class diversity. Extracted morphological, color and Haralick features also showed comparable distributions. The overall SAFE score of 89% reflected a high-quality generation, with all results compiled into an automated, interpretable report.

Conclusions. SAFE advances standard validation by embedding clinical expertise into its design, proving its robustness on the TITAN dataset. By offering a comprehensive, disease-specific evaluation of fidelity and clinical utility, SAFE stands out from existing tools, supporting reliable clinical research and potentially informing regulatory adoption of AI-generated evidence in hematology.

This content is only available as a PDF.
Sign in via your Institution