Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis

Desai, Karna; Sharma, Rohit; Croce, Phillip; Hanif, Ahmad; Thalji, Mousa; ElManzalawi, Yasser

doi:10.1182/blood-2024-211964

Introduction

The evolving landscape of artificial intelligence (AI) and machine learning (ML) has significantly impacted various fields, including healthcare. These technologies are now pivotal in enhancing diagnostic accuracy and predicting outcomes, particularly for complex conditions such as myelodysplastic syndromes (MDS). MDS is a group of hematopoietic disorders characterized by ineffective blood cell production and a propensity to progress to acute myeloid leukemia (AML). Traditionally, diagnosis relies on comprehensive blood tests, bone marrow examinations, and detailed patient history. However, ML models offer a promising alternative, leveraging vast amounts of data to identify patterns and predict disease presence with high accuracy. This systematic review aims to evaluate the performance of various ML models in diagnosing MDS, focusing on their accuracy, sensitivity, and specificity.

Methods

A comprehensive search strategy was employed using PubMed and other relevant databases to identify studies that utilized ML models for diagnosing MDS. The protocol followed PRISMA guidelines, ensuring a systematic and unbiased selection of studies. Inclusion criteria encompassed studies that applied ML techniques to clinical and laboratory data for MDS diagnosis and provided performance metrics such as Area Under the Curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE).

Results

A total of 14 studies were identified that utilized machine learning (ML) models for the diagnosis of myelodysplastic syndrome (MDS). The studies varied in their approaches, using different data sources such as bone marrow samples, peripheral blood samples, and flow cytometry data. The models employed included Convolutional Neural Networks (CNN), Decision Trees, Random Forests, Gradient Boosting Machines (GBM), and Elasticnet, among others. The performance metrics reported across these studies included Area Under the Curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE).

Performance Metrics Range:

AUC: 0.8 to 0.996
Sensitivity (SEN): 0.765 to 0.992
Specificity (SPE): 0.837 to 1

Top 5 Models:

1. Wang, M. et al. [19]

Model: CNN
Data Source: American Society of Hematology image bank and Hospital BMS samples
Outcomes: Diagnosing MDS
Performance: - Internal Validation: AUC 0.985, ACC 0.914, SEN 0.992, SPE 0.881 & External Validation: AUC 0.942, ACC 0.921, SEN 0.886, SPE 0.938

2. Lee, N. et al. [20]

Model: CNN
Data Source: Hospital BMS
Outcomes: Detecting dysplastic erythrocytes, granulocytes, megakaryocytes, and blasts
Performance (Detecting dysplastic granulocytes): - Internal Validation: AUC 0.996, ACC 0.993, SEN 0.9, SPE 0.999

3. Kimura, K. et al. [25]

Model: CNN with Xgboost
Data Source: Hospital PBS data
Outcomes: Diagnosing MDS and distinguishing it from AA
Performance: - Internal Validation: AUC 0.99, ACC >0.900, SEN 0.962, SPE 1

4. Herbig, M. et al. [29]

Model: Random forest
Data Source: University Hospital RT-DC data
Outcomes: Predicting MDS
Performance: - Internal Validation: AUC 0.95, ACC 0.91, SEN 0.86, SPE 1

5. Radakovich N

Model: GBM ML
Data Source: Multi-center data including Cleveland Clinic, Munich Leukemia Laboratory, and University of Pavia
Outcomes: Diagnosing MDS and other myeloid neoplasms
Performance: AUC 0.951

These top models demonstrate the high potential of ML in diagnosing MDS with high accuracy, sensitivity, and specificity, offering promising tools for clinical application.

Conclusion

This systematic review highlights the diverse ML approaches used for diagnosing MDS, with CNN models being the most frequently utilized. The models generally exhibit high AUC, sensitivity, and specificity, indicating their potential to improve diagnostic accuracy. However, variability in data sources and validation methods underscores the need for standardized protocols to ensure consistent performance across different clinical settings. Further research should focus on external validation and integration of these models into clinical practice to enhance early and accurate diagnosis of MDS.

Disclosures

No relevant conflicts of interest to declare.

This content is only available as a PDF.

2024

Sign in via your Institution

Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis

Cited By

Email alerts

ASH Publications

American Society of Hematology

Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis Free

This feature is available to Subscribers Only

My Account

Cited By

Email alerts

ASH Publications

American Society of Hematology

This Feature Is Available To Subscribers Only

Systematic Review of Machine Learning Models for Myelodysplastic Syndrome Diagnosis