Background: Identification of blood cell types is crucial for diagnosing and managing hematological diseases such as lymphoma and leukemia. Microscopic examination of blood smears remains essential for detecting morphological abnormalities (Vardiman J et al., Blood 2002). However, manual classification is labor-intensive, prone to observer variability, and challenging for large datasets (Tian Z et al., Photonics 2022). Efforts to automate blood cell analysis date back to the 1940s, when Wallace Coulter introduced a method where cells suspended in a conductive solution pass through an orifice, causing a change in electrical impedance that correlates with their volume (Coulter WH, US patent 2656508A, 1953). This principle established the foundation for automated cell counting systems used today. Despite recent advances in machine learning for cancer detection, blood cell identification remains challenging due to variations in cell size, frequency, sample preparation, and imaging conditions. Contemporary approaches, including TW-YOLO based on YOLOv8-small, have enhanced detection accuracy through attention modules and improved feature extraction (Zhang D et al., Sensors 2024). This study evaluates the efficacy of a YOLOv11-large (YOLOv11l) model with integrated spatial attention for simultaneous localization and classification of 13 white blood cell subtypes, trained on the multidomain LeukemiaAttri dataset (Rehman A et al., arXiv 2024).

Methods: A total of 18,664 annotated images from the LeukemiaAttri dataset were split into training (70%), validation (15%), and testing (15%) sets. To mirror real-world variability, the training set was expanded to 65,785 images via extensive data augmentation, including horizontal/vertical flips, rotations, shearing, saturation, brightness, exposure, Gaussian blur, and noise addition. A pretrained YOLOv11l model was fine-tuned on the augmented dataset for 250 epochs at a 640x640 resolution. Training utilized an auto-selected optimizer, momentum of 0.937, weight on decay of 5 × 10⁻⁴, augmix augmentation strategy, Automatic Mixed Precision (AMP) for faster convergence, and mosaic augmentation for the first 240 epochs. The batch size was automatically determined to maximize hardware utilization. End-to-end training took approximately 23.5 hours on a single NVIDIA H200 SXM GPU. The final detector performs simultaneous subtype classification and spatial localization in a single pass.

Results: Evaluated on the held-out set, the model's performance at the end of 250 epochs or training cycles resulted in a precision of 93.6% and a recall of 89.2%. Localization accuracy is measured by Intersection over Union (IoU), which quantifies the exact percentage of overlap between a predicted bounding box and the true object's bounding box. The model achieved a mean Average Precision at an IoU threshold of 0.50 (mAP50) of 93.8%. This metric evaluates both classification and localization, considering a detection correct if the IoU is 50% or higher. Similarly, the final mean Average Precision across the stricter 0.50-0.95 IoU (mAP50-95) was 77.6%. This metric provides a more comprehensive evaluation by averaging performance across a range of demanding IoU thresholds (50 to 95%). Performance peaked at epoch 244, achieving the highest mAP50-95 of 77.9% and an F1 score of 0.913. The F1 score is the harmonic mean of precision (94.1%) and recall (88.8%). In clinical practice, scores above 0.80 are generally strong, but thresholds vary with risk tolerance. Class-wise average precision was uniformly high (89.3%–98.2%), confirming reliable recognition across subtypes.Conclusions: Leveraging a YOLOv11l backbone, our detector achieved a high mAP50-95 of 77.9% during the best-performing epoch with an overall F1 of 0.913 across all subtypes in the LeukemiaAttri dataset. The results showed accurate localization and fine-grained classification even for infrequent cell subtypes. Overall, the model provided high-accuracy analysis of complex smears, identifying regions of interest and generating objective, quantitative data that could streamline microscopic review. Its reproducible training protocol, reliability, and efficient inference could provide utility as a decision-support tool. Benchmarking inferences and validating on external cohorts from diverse real-world environments are the next steps toward demonstrating generalizability and enabling seamless future integration into hematology platforms.

This content is only available as a PDF.
Sign in via your Institution