Schematic representation of the data collection and analysis. (A) Data from different data sources, including baseline tests (), routine laboratory tests (
), and recurrent mutations (
), were combined to construct a heterogeneous data set. Prediction point was set at 3 months postdiagnosis, and clinical outcomes (
) were predicted. (B) The clinical outcomes were death (
), treatment (
), the combined event of treatment or infection (composite), and infection (
). (C) Based on the combination of feature sets, 4 models were defined: (1) IPI, which included CLL-IPI score and the CLL-IPI features only; (2) +BL, which included CLL-IPI features, baseline tests, and routine laboratory tests; (3) +MUT, which included CLL-IPI features and recurrent mutations; and (4) ALL, which included all features. (D) Clinical outcomes were predicted in 2- and 5-year outlooks postdiagnosis (except for the first 3 months). (E) The data from different sources were merged to create one data set (
). Then, for a specific outcome and outlook, the target values were created and later used in the training/test (
). Based on the model, feature extraction was performed (
). A stacked ML model consisting of 7 algorithms and a fusion stage based on majority voting was trained and tested. The performance of the models (
) and the contribution of the features (
) were estimated to identify the risk factors predictive of each combination of outcome, model, and outlook. tNGS, targeted next-generation sequencing.