Prediction Of Allogeneic Hematopoietic Stem Cell Transplantation (allo-HSCT) Related Mortality in Acute Leukemia: Generation Of a Machine Learning-Based Model Using The Data Set of The Acute Leukemia Working Party (ALWP) Of The EBMT

Shouval, Roni; Labopin, Myriam; Bondi, Ori; Shamay, Hila Mishan; Shimoni, Avichai; Ciceri, Fabio; Esteve, Jordi; Giebel, Sebastian; Gorin, Norbert-Claude; Schmid, Christoph; Zakaria, Imane; Moukhtari, Leila; Polge, Emmanuelle; Al-Jurf, Mahmoud; Kröger, Nicolaus; Craddock, Charles; Bacigalupo, Andrea; Cornelissen, Jan; Baron, Frederic; Unger, Ron; Nagler, Arnon; Mothy, Mohamad

doi:10.1182/blood.V122.21.409.409

Abstract

Background

Allo-HSCT has been shown to increase survival and improve cure in acute leukemia (AL). However, this procedure is accompanied by high rates of morbidity and mortality. Several risk scores based on conventional statistical methods may aid decision regarding whom and how to perform allo-HSCT, but these methods carry inherent limitations, which may lead to sub-optimal candidate selection. Machine learning (ML) is a field in computer science stemming from artificial intelligence and is part of the data mining approach for data analysis. ML algorithms are commonly applied in technological and commercial settings. They allow for coping with complex data scenarios and thus may be suitable for outcome prediction in allo-HSCT. With this background, and using a ML prediction method- the alternating decision tree (ADT) algorithm, we developed an interpretable model for overall mortality (OM) and treatment-related mortality (TRM) at day +100 after allo-HSCT in AL.

Patients and Methods

28,995 adult allo-HSCT recipients from the registry of the ALWP of EBMT were analyzed. Twenty two variables were available including year of transplant (range, 2000-2011), diagnosis (Acute Myeloid Leukemia and Acute Lymphoblastic Leukemia), disease status, Karnofsky performance status, conditioning regimen (myeloablative or reduced- intensity conditioning), graft type (peripheral blood, bone marrow or cord blood), donor and recipient HLA compatibility, CMV serostatus, GVHD prophylaxis regimens, etc. Per study definitions, the primary outcomes to be predicted were OM and TRM at day +100 days after allo-HSCT. The complete dataset was split into 3 sets: training set (n=11,600), testing set (n=8,688) and validation set (n=8,707). The ADT prediction model was tested and optimized according to the first 2 subgroups and validated on the last one. Output from the ADT model, included variables selection with assigned scores in a tree-based structure and area under the ROC curve (AUC), a measure for model discrimination

Results

Each of the ADT models selected 12 out 22 variables for prediction of OM and TRM at day +100. Ten variables were mutual for both prediction models, although different weights were assigned. These included: age, diagnosis, disease status, Karnofsky performance status (all at time of transplant), donor-recipient HLA-matching, number of transplants in each center per year, year of transplant, conditioning regimen and the donor's and patient's CMV serostatus. Variables selected exclusively by the OM prediction model were graft type and donor-patient CMV serostatus match, whereas the TRM model selected time from diagnosis to transplant and donor-recipient sex match. Applying the models on the validation set yielded AUCs of 0.701 (95% confidence interval [CI] 0.691-0.710) for OM prediction and 0.67 (95% [CI] 0.66-0.68) for TRM. The ADT prediction models assigned scores correlating with patient outcome. Patients in each of the validation sets were grouped according to their score range and the prediction success. A Higher score was correlated with higher rate of the measured outcome in both models (figure 1 and figure 2).

View large Download slide

Figure 1

View large Download slide

Figure 2

Conclusions

We present two new models, based on the ADT ML algorithm, for prediction of OM and TRM at day +100 after allo-HSCT. The models are robust as they rely on a high number of samples and a large validation set. As shown in the figures, higher scores correlated with a poorer outcome, reaching more than 50% mortality for a score range of 5.76-7 in the OM prediction model. The AUC performance measure was better for OM than TRM, possibly due to a higher event rate in former, making it easier to predict. Improving the predictive ability will probably necessitate evaluation of more variables, as the limitation of the maximal predictive performance is most likely in the information gained from the variables and not from the sample size or algorithm used. This is currently under progress, especially combination with other risk scores linked to comorbidities.

In summary, our models can aid candidate selection for allo-HSCT, by providing a measurable score that correlates with transplant success.

Disclosures:

Schmid:Novartis: Honoraria, Research Funding, travel grant Other; Roche: travel grant, travel grant Other; MSD: Honoraria.

Author notes

*

Asterisk with author names denotes non-ASH members.

This icon denotes a clinically relevant abstract

2013

Sign in via your Institution

Prediction Of Allogeneic Hematopoietic Stem Cell Transplantation (allo-HSCT) Related Mortality in Acute Leukemia: Generation Of a Machine Learning-Based Model Using The Data Set of The Acute Leukemia Working Party (ALWP) Of The EBMT

Abstract

Author notes

Cited By

Email alerts

ASH Publications

American Society of Hematology

Prediction Of Allogeneic Hematopoietic Stem Cell Transplantation (allo-HSCT) Related Mortality in Acute Leukemia: Generation Of a Machine Learning-Based Model Using The Data Set of The Acute Leukemia Working Party (ALWP) Of The EBMT Free

Abstract

Author notes

This feature is available to Subscribers Only

My Account

Cited By

Email alerts

ASH Publications

American Society of Hematology

This Feature Is Available To Subscribers Only

Prediction Of Allogeneic Hematopoietic Stem Cell Transplantation (allo-HSCT) Related Mortality in Acute Leukemia: Generation Of a Machine Learning-Based Model Using The Data Set of The Acute Leukemia Working Party (ALWP) Of The EBMT