Key Points
Ferritin is the best test for predicting bone marrow iron stores in patients with hematologic disorders. Other blood tests do not add value.
Low ferritin indicates iron deficiency. However, ferritin or other blood tests rarely rule out iron deficiency.
Visual Abstract
Although iron-deficiency anemia is common, interpreting iron laboratory test results can be challenging in patients with comorbidities. We aimed to study the accuracy of common iron biomarkers compared with bone marrow iron staining in a large retrospective data set of patients with hematologic disorders. We collected from 6610 patients (median age, 66 years) results of iron staining, with their concurrent ferritin, transferrin saturation, soluble transferrin receptor, transferrin, hemoglobin, and mean red blood cell volume results from Helsinki University Hospital electronic health records. In receiver operating characteristics analysis, ferritin had the highest area under the curve (AUC) with 88% (95% confidence interval [CI], 86-90) for females and 89% (95% CI, 87-91) for males in predicting reduced bone marrow iron. Using a ferritin cutoff of 30 μg/L resulted in high specificity rates of 97% in females and 99% in males. However, sensitivity rates were only 54% and 35%, respectively. Other studied biomarkers had inferior AUCs. Multivariate logistic regression models did not significantly perform better in prediction than ferritin alone. With 50% preprobability for reduced iron stores, a ferritin of 30 μg/L (females) and 51 μg/L (males) had a 95% positive predictive value for reduced iron stores. A 95% negative predictive value was achieved at 1750 μg/L (females) and 4967 μg/L (males). In our large population study, ferritin was the best single biomarker for iron deficiency in secondary care. Adding other blood tests in a multivariate model did not improve performance. However, in these patients with hematologic disorders, even a high ferritin did not rule out iron deficiency with 95% certainty.
Introduction
One-fourth of the world’s population is currently having anemia.1,2 Anemia is defined by the World Health Organization as hemoglobin concentration <130 g/L for males, 120 g/L for nonpregnant females, and 110 g/L for pregnant females.1 The most common cause of anemia is iron deficiency, which contributes to approximately half of the cases in females worldwide and more than half of the cases in high-income regions.2 Iron-deficiency anemia (IDA) is associated with increased mortality and morbidity.3-5 In addition, reliable detection of iron deficiency is clinically important because it may indicate other diseases such as gastrointestinal cancer.4,6
Given that iron is stored in the bone marrow, bone marrow aspiration has been considered the gold standard for diagnosing iron deficiency, but it is invasive and costly.7,8 More commonly, surrogate plasma or serum biomarkers are used to assess anemia and iron stores: ferritin, transferrin saturation (TSAT), soluble transferrin receptor (sTfR), and hemoglobin and erythrocyte mean cell volume (MCV). These biomarkers have high diagnostic accuracy value in healthy adult population in diagnosing IDA.4 However, interpretation of these laboratory tests is not always straightforward, with multiple confounding factors, including patient-specific, preanalytical, and analytical laboratory assay considerations.9,10 For example, inflammation can increase ferritin concentrations and decrease TSAT, and erythropoietic activity affects sTfR levels. Cutoffs are dependent on the reference population and laboratory methods, and consensus on their use in patients with comorbidities (eg, inflammation or cancer) is lacking.6,11
Of these laboratory tests, ferritin has been compared most with bone marrow iron stores in published studies. However, although a recent systematic review7 included 6059 patients with ferritin and bone marrow iron data, the diagnostic efficacy of the commonly used decision-making cutoff of 30 μg/L6 was studied in only 512 participants. Other studied cutoffs were heterogenous, ranging from 12 to 200 μg/L.7 Although ferritin has demonstrated satisfactory specificity in most cutoffs, its sensitivity has been inadequate. Consequently, patients with iron deficiency may present with elevated ferritin levels.7
We aimed to assess the agreement of iron biomarkers ferritin, TSAT, sTfR, and MCV with the gold standard bone marrow Prussian blue stain on iron stores in a large real-life population of patients with hematologic disorders. Furthermore, we aimed to determine predictive values for single biomarkers and multivariate models for iron deficiency prediction and to illustrate them.
Materials and methods
Materials
In this retrospective registry-based study, we used Helsinki University Hospital (HUS) electronic health records of 12 years, between 2009 and 2020 available in the HUS data lake infrastructure. Since 2009, standardized, full-length interpretation data of bone marrow aspirates have been available in electronic patient records. For the study population, we included patients who had a bone marrow aspirate sample with iron staining (Prussian blue) taken. Patient age, sex, and International Classification of Diseases, Tenth Revision, diagnoses were recorded. We excluded patients with no data of diagnoses, those treated with IV iron supplement within 3 months before or 1 month after the aspirate, and those who had any diagnosis related to pregnancy or childbirth up to 1 year before or after the aspirate. Inclusion and exclusion criteria are shown in Figure 1. In HUS, bone marrow aspirates are interpreted by specialized laboratory hematologists (n = 10 during the study period) using standard phrasing. The details of the interpretation and reporting are presented in supplemental Figure 1.
Flowchart of the reports of the bone marrow aspirates. The samples were from the Helsinki University Hospital between 2009 and 2020, and the number of aspirates remaining in statistical analysis of ferritin, TSAT, sTfR, transferrin, hemoglobin, and MCV.
Flowchart of the reports of the bone marrow aspirates. The samples were from the Helsinki University Hospital between 2009 and 2020, and the number of aspirates remaining in statistical analysis of ferritin, TSAT, sTfR, transferrin, hemoglobin, and MCV.
Methods
Bone marrow aspirate interpretations were categorized with a regular expression algorithm based on the bone marrow aspirate report keywords and phrases to 4 different iron storage categories (none, low, normal, and abundant) and to undefined. Cases with undefined iron storage were excluded. We verified the accuracy of the categorization algorithm by selecting 100 random aspirates from the defined categories and checked by reading the whole aspirate report if the categorization was corresponding to the report. We then calculated the proportion of correctly classified results with the binomial exact confidence intervals (CIs). In the verification, all the samples were correctly classified (95% CI, 96-100) by the algorithm. Given the lack of standardized methodology for bone marrow iron staining interpretation, we assessed the variability of the interpretations: 3 independent laboratory hematologists analyzed a random set of samples (n = 105) from 2012 to 2020 (supplemental Table 1). The observed inter-rater variability is attributable to the subjective nature of the interpretation process. There was 84% agreement between consensus and the original report. In our study, we mainly used only 2 categories, that is, reduced (none and low) and full (normal and abundant) iron stores, and there, the agreement was as high as 98%. No time-related change was observed (P = .81).
For each aspirate, we selected biomarker results with the phlebotomy date closest to the aspiration date. Only aspirates with a date difference of 30 days or less were included when studying a biomarker. In the case of multivariate analysis, only aspirates with all studied biomarkers were included. If the patient had several qualifying aspirates, the first among the aspirates with reduced iron stores was chosen (Figure 1). If none of the patient’s aspirates had reduced iron stores, then the first was chosen.
Plasma ferritin, iron, transferrin, and sTfR were analyzed with (1) Roche Hitachi Modular (F. Hoffmann-La Roche, Basel, Switzerland) from 2009 to 2016, (2) Abbott Architect (Abbott Laboratories, Abbott Park, IL) from 2016 to 2019, and (3) Siemens Atellica Solution (Siemens Healthineers, Erlangen, Germany) from 2019 to 2020 with standard procedures as part of patient care. The methods for ferritin, transferrin, and sTfR were immunochemical photometry, and for plasma iron, the method was photometry based on ferrozine reaction. TSAT was calculated by the following formula: 3.825 times plasma iron (μmol/L) per transferrin (g/L). Hemoglobin and MCV were analyzed with Sysmex K-, XE-, and XN-series analyzers (Sysmex, Kobe, Japan) with the methods of photometry and impedance, respectively. All the blood count samples were collected into EDTA tubes, and the other samples were collected mostly to lithium-heparin tubes and <1% of them to serum tubes. Fasting was recommended to TSAT and transferrin. All blood samples were taken and analyzed in the same accredited laboratory organization (HUS Diagnostic Center; ISO 15189).
Statistical analyses
Wilcoxon rank sum test was used for the median comparisons because the results were not normally distributed. Receiver operating characteristic (ROC) curves and area under the curve (AUC) were used to assess cutoffs. Reduced iron stores were expected to cause lower results for ferritin, TSAT, hemoglobin, and MCV and higher results for sTfR and transferrin. The significance of difference of 2 AUCs was estimated with the bootstrap method of pROC.12 We used the Holm-Bonferroni correction over all the calculated P values to adjust for multiple comparisons. We considered initially statistically significant P < .05 and after Holm-Bonferroni correction P < 2e-04. In the analysis of sensitivity, specificity, and diagnostics odds ratio,7 different cutoffs were used (supplemental Table 2).
To illustrate predictive efficacy, we calculated the cutoffs with at least 95% positive predictive value (PPV) for different preprobabilities. From these the cutoff with the highest sensitivity represents the 95% PPV. Similarly, we calculated the cutoffs with at least 95% negative predictive value (NPV), and the cutoff with the highest specificity represents the 95% NPV.
We estimated CI for sensitivities, specificities, diagnostics odds ratios, and cutoffs with 95% PPV or NPV with manual bootstrapping. First, we created 1000 random sets of paired biomarker results and iron storage levels. Then, we reran the calculations for each set. The sets were the same size as the original data. From these calculations, the lower and upper bounds of CI were calculated using a nonparametric method.
Multivariate analysis
We did 2 multivariate analyses of logistic regression with a binary outcome of reduced or full bone marrow iron stores. Both models were done separately for females and males. For the first model (ModelAIC), we selected the biomarkers with backward stepwise Akaike information criterion.13 Given that the analysis requires that all the patients have all the biomarkers available, we enlarged the number of patients by expanding the data after each step. The expansion was done by selecting the aspirates with the remaining biomarkers. The starting markers were age and biomarkers ferritin, TSAT, sTfR, transferrin, plasma iron, hemoglobin, MCV, erythrocyte count, hematocrit, red cell distribution width, mean red cell hemoglobin (MCH), MCH concentration, reticulocyte count, C-reactive protein, and erythrocyte sedimentation rate. For the second model (ModelFe), we used age and the biomarkers ferritin, sTfR, transferrin, plasma iron, hemoglobin, MCV, erythrocyte count, hematocrit, red cell distribution width, MCH, and MCH concentration.
Software
R version 4.1.2 was used for data analysis.14 We did the images with packages ggplot2 version 3.3.515 and pROC version 1.18.0.12 We did analyte selection for multivariate analysis with MASS version 7.3.54.13 We did all data analysis inside HUS Acamedic, which is a secure, scalable, virtual, audited operating environment.16
This study was approved by the HUS Medical Research Committee (§47/2021), and the Declaration of Helsinki principles were followed. Ethical committee review was not required because the study was registry based. Data from patients who opted out of medical research were not used.
Results
In this large cohort of 14 195 bone marrow aspirates, the bone marrow iron stains were classified as abundant iron, normal iron, low iron, and no iron stores by a regular expression algorithm. After exclusions, there were 6610 aspirates and patients remaining and their iron stores were abundant (n = 1803), normal (3309), low (646), or none (852) (Figure 1; supplemental Table 3). Of the patients, 46% were females, and the median age was 66 years for females and 67 years for males. Approximately one-third of the patients did not have anemia (hemoglobin >120 g/L for females and >130 g/L for males), and most anemia patients had moderate anemia (hemoglobin <110 g/L and >80 g/L) (Tables 1 and 2; supplemental Table 3). Of all the patients, 11% had myeloproliferative neoplasia, 13% had myelodysplastic neoplasia, 31% had other hematologic malignancy, and 3% had non-IDA; 41% had none of these conditions (Table 1; supplemental Table 12).
Descriptive statistics of the studied biomarkers in different patient groups
Patient group . | Sex . | n . | Age, median (IQR), y . | Iron blood marker median (IQR) . | |||||
---|---|---|---|---|---|---|---|---|---|
Ferritin, μg/L . | TSAT, % . | sTfR, mg/L . | Transferrin, g/L . | Hemoglobin, g/L . | MCV, fL . | ||||
All | Both | 6610 | 66 (55-75) | 180 (68-458) | 23 (15-34) | 3.7 (2.6-5.6) | 2.2 (1.8-2.7) | 116 (100-133) | 91 (87-97) |
Female | 3065 | 66 (54-76) | 127 (46-360) | 21 (14-31) | 3.8 (2.7-5.9) | 2.3 (1.9-2.8) | 113 (98-130) | 91 (86-96) | |
Male | 3545 | 67 (56-75) | 234 (100-566) | 25 (16-35) | 3.6 (2.6-5.3) | 2.2 (1.8-2.6) | 119 (102-137) | 92 (87-97) | |
MPN | Female | 391 | 65 (52-75) | 59 (25-112) | 23 (13-30) | 4.0 (2.8-6.8) | 2.7 (2.4-3) | 137 (124-151) | 89 (85-93) |
Male | 353 | 63 (54-71) | 108 (33-202) | 24 (14-34) | 3.4 (2.7-5.3) | 2.5 (2.3-2.9) | 150 (128-167) | 89 (85-93) | |
MDN | Female | 444 | 65 (51-76) | 138 (60-404) | 28 (18-39) | 3.3 (2.4-4.7) | 2.4 (2.1-2.7) | 116 (98-133) | 91 (87-98) |
Male | 493 | 67 (56-76) | 292 (128-574) | 30 (21-42) | 3.8 (2.8-5.4) | 2.3 (1.9-2.6) | 119 (100-143) | 93 (88-99) | |
HM | Female | 957 | 68 (59-76) | 253 (95-626) | 21 (15-33) | 3.7 (2.5-5.9) | 2.0 (1.7-2.4) | 106 (94-120) | 93 (88-97) |
Male | 1220 | 67 (57-75) | 378 (132-939) | 27 (17-39) | 3.7 (2.5-6.1) | 2.0 (1.7-2.4) | 111 (97-128) | 93 (88-98) | |
non-IDA | Female | 125 | 60 (43-74) | 256 (70-677) | 29 (22-64) | 4.8 (2.5-7.2) | 2.1 (1.7-2.7) | 96 (83-110) | 95 (89-100) |
Male | 103 | 58 (46-72) | 388 (172-785) | 29 (22-42) | 4.3 (2.3-5.8) | 2.0 (1.7-2.3) | 103 (88-124) | 96 (89-101) | |
Others | Female | 1236 | 66 (51-76) | 114 (36-323) | 19 (10-27) | 4.0 (2.8-5.9) | 2.3 (1.9-2.8) | 112 (99-127) | 90 (85-95) |
Male | 1493 | 68 (57-77) | 220 (99-512) | 23 (14-33) | 3.5 (2.6-4.8) | 2.2 (1.7-2.6) | 119 (103-133) | 91 (87-96) |
Patient group . | Sex . | n . | Age, median (IQR), y . | Iron blood marker median (IQR) . | |||||
---|---|---|---|---|---|---|---|---|---|
Ferritin, μg/L . | TSAT, % . | sTfR, mg/L . | Transferrin, g/L . | Hemoglobin, g/L . | MCV, fL . | ||||
All | Both | 6610 | 66 (55-75) | 180 (68-458) | 23 (15-34) | 3.7 (2.6-5.6) | 2.2 (1.8-2.7) | 116 (100-133) | 91 (87-97) |
Female | 3065 | 66 (54-76) | 127 (46-360) | 21 (14-31) | 3.8 (2.7-5.9) | 2.3 (1.9-2.8) | 113 (98-130) | 91 (86-96) | |
Male | 3545 | 67 (56-75) | 234 (100-566) | 25 (16-35) | 3.6 (2.6-5.3) | 2.2 (1.8-2.6) | 119 (102-137) | 92 (87-97) | |
MPN | Female | 391 | 65 (52-75) | 59 (25-112) | 23 (13-30) | 4.0 (2.8-6.8) | 2.7 (2.4-3) | 137 (124-151) | 89 (85-93) |
Male | 353 | 63 (54-71) | 108 (33-202) | 24 (14-34) | 3.4 (2.7-5.3) | 2.5 (2.3-2.9) | 150 (128-167) | 89 (85-93) | |
MDN | Female | 444 | 65 (51-76) | 138 (60-404) | 28 (18-39) | 3.3 (2.4-4.7) | 2.4 (2.1-2.7) | 116 (98-133) | 91 (87-98) |
Male | 493 | 67 (56-76) | 292 (128-574) | 30 (21-42) | 3.8 (2.8-5.4) | 2.3 (1.9-2.6) | 119 (100-143) | 93 (88-99) | |
HM | Female | 957 | 68 (59-76) | 253 (95-626) | 21 (15-33) | 3.7 (2.5-5.9) | 2.0 (1.7-2.4) | 106 (94-120) | 93 (88-97) |
Male | 1220 | 67 (57-75) | 378 (132-939) | 27 (17-39) | 3.7 (2.5-6.1) | 2.0 (1.7-2.4) | 111 (97-128) | 93 (88-98) | |
non-IDA | Female | 125 | 60 (43-74) | 256 (70-677) | 29 (22-64) | 4.8 (2.5-7.2) | 2.1 (1.7-2.7) | 96 (83-110) | 95 (89-100) |
Male | 103 | 58 (46-72) | 388 (172-785) | 29 (22-42) | 4.3 (2.3-5.8) | 2.0 (1.7-2.3) | 103 (88-124) | 96 (89-101) | |
Others | Female | 1236 | 66 (51-76) | 114 (36-323) | 19 (10-27) | 4.0 (2.8-5.9) | 2.3 (1.9-2.8) | 112 (99-127) | 90 (85-95) |
Male | 1493 | 68 (57-77) | 220 (99-512) | 23 (14-33) | 3.5 (2.6-4.8) | 2.2 (1.7-2.6) | 119 (103-133) | 91 (87-96) |
HM, hematologic malignancy; IQR, interquartile range; MDN, myelodysplastic neoplasia; MPN, myeloproliferative neoplasia.
Prevalence of iron deficiency and anemia in study groups and the AUC of the ROC analysis
Biomarker . | Sex . | n . | Reduced iron stores, % . | Any anemia, % . | AUC, % (95% CI) . |
---|---|---|---|---|---|
Ferritin | Female | 1601 | 27 | 66 | 88 (86-90) |
Male | 1861 | 18 | 71 | 89 (87-91) | |
TSAT | Female | 1092 | 27 | 67 | 74 (71-78) |
Male | 1312 | 19 | 72 | 77 (73-80) | |
sTfR | Female | 756 | 30 | 81 | 73 (69-77) |
Male | 816 | 23 | 84 | 72 (68-76) | |
Transferrin | Female | 1131 | 28 | 68 | 85 (82-88) |
Male | 1356 | 19 | 72 | 79 (76-82) | |
Hemoglobin | Female | 3065 | 27 | 61 | 42 (40-44) |
Male | 3545 | 19 | 67 | 47 (45-50) | |
MCV | Female | 3064 | 27 | 61 | 68 (66-71) |
Male | 3541 | 19 | 67 | 64 (61-66) | |
ModelAIC | Female | 1005 | 28 | 66 | 89 (86-91) |
Male | 415 | 21 | 82 | 87 (83-92) | |
ModelFe | Female | 402 | 33 | 77 | 92 (89-95) |
Male | 407 | 21 | 82 | 88 (84-93) |
Biomarker . | Sex . | n . | Reduced iron stores, % . | Any anemia, % . | AUC, % (95% CI) . |
---|---|---|---|---|---|
Ferritin | Female | 1601 | 27 | 66 | 88 (86-90) |
Male | 1861 | 18 | 71 | 89 (87-91) | |
TSAT | Female | 1092 | 27 | 67 | 74 (71-78) |
Male | 1312 | 19 | 72 | 77 (73-80) | |
sTfR | Female | 756 | 30 | 81 | 73 (69-77) |
Male | 816 | 23 | 84 | 72 (68-76) | |
Transferrin | Female | 1131 | 28 | 68 | 85 (82-88) |
Male | 1356 | 19 | 72 | 79 (76-82) | |
Hemoglobin | Female | 3065 | 27 | 61 | 42 (40-44) |
Male | 3545 | 19 | 67 | 47 (45-50) | |
MCV | Female | 3064 | 27 | 61 | 68 (66-71) |
Male | 3541 | 19 | 67 | 64 (61-66) | |
ModelAIC | Female | 1005 | 28 | 66 | 89 (86-91) |
Male | 415 | 21 | 82 | 87 (83-92) | |
ModelFe | Female | 402 | 33 | 77 | 92 (89-95) |
Male | 407 | 21 | 82 | 88 (84-93) |
Bone marrow iron stores showed a positive correlation with ferritin, TSAT, and MCV in a dose-dependent fashion: higher bone marrow iron content was associated with higher biomarker levels (Figure 2). Conversely, sTfR, transferrin, and hemoglobin levels were lower when the amount of stainable bone marrow iron was higher (Figure 2).
Box plots of different biomarker levels in different iron store levels. (A) Ferritin in logarithmic scale, (B) TSAT, (C) sTfR, (D) transferrin, (E) hemoglobin, and (F) MCV. The box plots are grouped by bone marrow iron storage in females (blue) and males (red). Inside boxes, thick lines are medians; notches around them are 95% CI. The box is interquartile range (IQR). The whiskers equal minimum or maximum up to 1.5 times IQR. Outliers (>1.5 times IQR from edges of box) are marked as dots.
Box plots of different biomarker levels in different iron store levels. (A) Ferritin in logarithmic scale, (B) TSAT, (C) sTfR, (D) transferrin, (E) hemoglobin, and (F) MCV. The box plots are grouped by bone marrow iron storage in females (blue) and males (red). Inside boxes, thick lines are medians; notches around them are 95% CI. The box is interquartile range (IQR). The whiskers equal minimum or maximum up to 1.5 times IQR. Outliers (>1.5 times IQR from edges of box) are marked as dots.
Ferritin shows a robust correlation with bone marrow iron stores
Median levels of ferritin in females and males with no bone marrow iron were 21 and 33 μg/L, low iron 38 and 67 μg/L, normal iron 149 and 246 μg/L, and abundant iron 510 and 570 μg/L, respectively (Figure 2; supplemental Table 4).
Of the studied biomarkers, ferritin showed the best prediction for reduced bone marrow iron stores. Ferritin had the highest AUC of the studied biomarkers, with an AUC of 88% for females and 89% for males (Figure 3; Table 2). The commonly used cutoff of 30 μg/L showed a sensitivity of 54% for females and 35% for males and a specificity of 97% for females and 99% for males (supplemental Table 2). In addition, different ferritin cutoffs used in the literature as surrogates for iron deficiency were tested in these data (supplemental Table 2).
ROC curves with 95% CI of ferritin (light green), TSAT (brown), sTfR (pink), transferrin (orange), hemoglobin (dark green), and MCV (purple) for females and males.
ROC curves with 95% CI of ferritin (light green), TSAT (brown), sTfR (pink), transferrin (orange), hemoglobin (dark green), and MCV (purple) for females and males.
Ferritin was superior to TSAT, sTfR, transferrin, hemoglobin, and MCV
Other studied biomarkers had inferior AUCs to ferritin (P < 1e-06) except for transferrin for females (P = .04) (Figure 3; Table 2; supplemental Table 5). Hemoglobin had AUC of <50%, given that the level was lower with higher bone marrow iron stores, surprisingly (Figure 3; Table 2). MCV had some diagnostic value, but clearly inferior AUC (Figure 3; Table 2). The common cutoff of 80 fL for MCV had low sensitivity, but good specificity (supplemental Table 2).
Multivariate models did not perform better than ferritin alone
The biomarkers selected for the final multivariate model ModelAIC were ferritin, transferrin, hemoglobin, and MCH for females and ferritin, transferrin, sTfR, hematocrit, MCV, and erythrocyte count for males. The steps of biomarker selection are presented in supplemental Table 6. Both logistic regression models, ModelAIC and ModelFe, resulted in ROC curves that overlapped the CI of the ROC curve of ferritin. None of the differences between AUCs of the models and AUCs of ferritin were significant (Figure 4; Table 2; supplemental Table 5). Thus, neither model proved to be better than ferritin alone. The P values of the variables for the models are presented in supplemental Tables 7 and 8.
ROC curves with 95% CI of the logistic regression models and ferritin. Panels A-B have ferritin (light green) and ModelAIC (yellow). The P values for the difference of the AUCs were 0.78 for females and 0.57 for males. Panels C-D have ROC curves for ferritin (light green) and ModelFe (brown). With them, the P values for the difference of the AUCs were 0.04 for females and 0.82 for males.
ROC curves with 95% CI of the logistic regression models and ferritin. Panels A-B have ferritin (light green) and ModelAIC (yellow). The P values for the difference of the AUCs were 0.78 for females and 0.57 for males. Panels C-D have ROC curves for ferritin (light green) and ModelFe (brown). With them, the P values for the difference of the AUCs were 0.04 for females and 0.82 for males.
Ruling in and out iron deficiency at different clinical preprobability scenarios
The ferritin cutoffs with 95% PPV and 95% NPV for different preprobabilities of reduced iron stores are shown in Figure 5, in linear scale in supplemental Figure 2, and in supplemental Table 9. With 50% preprobability, the highest cutoff with 95% PPV was 30 μg/L for females and 51 μg/L for males, and the lowest cutoff with NPV of at least 95% was 1750 μg/L for females and 4967 μg/L for males. With other biomarkers, ruling in or out reduced iron stores with 95% probability required very high or low cutoffs or was impossible (supplemental Table 9; supplemental Figures 3-7).
The cutoffs of ferritin that have at least 95% PPV or at least 95% NPV to predict reduced iron stores with different preprobabilities. The area below blue line has >95% certainty of reduced iron stores, the area above red line has >95% certainty of full iron stores, and the area between blue and red areas does not have either. The shaded blue and red areas resemble the 95% CI for the cutoffs. Ferritin is in logarithmic scale. The same results in linear scale are presented in supplemental Figure 2. If the preprobability is estimated for a patient, the cutoffs for ruling in iron deficiency can be seen from the figure. For example, for an imaginary patient with 50% preprobability for reduced iron stores, the highest cutoff for ferritin with 95% PPV is 30 μg/L (females) or 51 μg/L (males), and the lowest cutoff with NPV of at least 95% is 1750 or 4967 μg/L, respectively. This means that if the clinician estimates that a patient has a 50% chance of having reduced iron stores, a ferritin level of <30 μg/L (females) or 51 μg/L (males), indicates at least a 95% chance of having reduced iron stores. However, in the same scenario, only a ferritin level of >1750 or 4967 μg/L, respectively, can rule out reduced iron stores with 95% probability.
The cutoffs of ferritin that have at least 95% PPV or at least 95% NPV to predict reduced iron stores with different preprobabilities. The area below blue line has >95% certainty of reduced iron stores, the area above red line has >95% certainty of full iron stores, and the area between blue and red areas does not have either. The shaded blue and red areas resemble the 95% CI for the cutoffs. Ferritin is in logarithmic scale. The same results in linear scale are presented in supplemental Figure 2. If the preprobability is estimated for a patient, the cutoffs for ruling in iron deficiency can be seen from the figure. For example, for an imaginary patient with 50% preprobability for reduced iron stores, the highest cutoff for ferritin with 95% PPV is 30 μg/L (females) or 51 μg/L (males), and the lowest cutoff with NPV of at least 95% is 1750 or 4967 μg/L, respectively. This means that if the clinician estimates that a patient has a 50% chance of having reduced iron stores, a ferritin level of <30 μg/L (females) or 51 μg/L (males), indicates at least a 95% chance of having reduced iron stores. However, in the same scenario, only a ferritin level of >1750 or 4967 μg/L, respectively, can rule out reduced iron stores with 95% probability.
Additional analyses
We also tested the performance of these biomarkers in differentiating patients with a more stringent category of no stainable bone marrow iron and removing patients with low bone marrow iron stores. This resulted in slightly higher AUCs, but the differences were not significant for any of the biomarkers (supplemental Tables 10 and 11).
We further explored 6 age groups and 5 patient groups: myeloproliferative neoplasia, myelodysplastic neoplasia, hematologic malignancy, non-IDA, and others (supplemental Table 12 for clarification). The results in all groups were similar to those in the full data set. However, a few groups had AUC values that were significantly different from the AUC values of other patients. Ferritin did not have any significant difference (supplemental Tables 13-18; supplemental Figures 8-13).
We also tested how the different biomarkers compared in patients with C-reactive protein of <10 mg/L with those with ≥10 mg/L. After Holm-Bonferroni correction, there were no significant differences in AUCs (supplemental Tables 19 and 20).
Discussion
To the best of our knowledge, we show in the largest data set to date that ferritin is the best predictor for bone marrow iron stores in a real-life patient population. We studied the most common laboratory variables used for detecting IDA, that is, ferritin, TSAT, sTfR, transferrin, hemoglobin, and MCV, and compared those with bone marrow iron staining in >6600 patients with hematologic disorders. In ROC analysis, ferritin had the highest AUC in all patient groups, which is congruent with previous studies.6,17 Previously, lower cutoffs of 12 to 15 μg/L have been used, but in patients with anemia, the 30 μg/L cutoff has been shown to have higher sensitivity but retain excellent specificity for iron deficiency.18 Our study has similar excellent specificities with the ferritin cutoff of 30 μg/L as previous studies, but the sensitivities in our study were significantly lower.7 A possible reason for this difference is that hematologic cancers and myeloproliferative disorders and their treatments can affect the diagnostic performance of ferritin. Nevertheless, we show that, even in this patient population, commonly used decision-making thresholds for ferritin in diagnosing iron deficiency are specific. Thus, a low ferritin predicts reliably reduced iron stores, regardless of the patient’s diagnoses and older age. However, the clinician cannot rule out iron deficiency with reasonable certainty if ferritin levels are normal.
In earlier studies, the studied populations have been small, the inclusion criteria diverse, and the used cutoffs for ferritin varied.7 In addition, there has been a lack of harmonization among different manufacturers of ferritin analyzers.10 These can explain the heterogeneity of results of the previous studies.7 All the analyzers in our study have been verified for clinical use, which strengthens our findings. We verified the comparability of ferritin results across the years by comparing patient result medians by year, and these did not have significant differences (supplemental Figure 14).
Data on ferritin as a marker of bone marrow iron in patients with hematologic disorders are lacking. Nine studies that had a blood disorder as an inclusion criterion were included in a recent systematic review.7 All these studies had <100 participants, and the studied ferritin cutoffs varied from 12 to 100 μg/L. Only one of these studied the cutoff of 30 μg/L, resulting in sensitivity of 29% and specificity of 98% in patients with sickle cell anemia.19 Our study shows that ferritin performs even better in patients with diagnosed hematologic disorders (supplemental Tables 2 and 13). Moreover, a multivariate analysis combining other blood tests with ferritin did not yield better prediction than ferritin alone. This suggests that ferritin alone is often useful, and other biomarkers do not add significant diagnostic value if ferritin is available.
Transferrin performed second best of the studied iron biomarkers, and for females it was better than TSAT. This finding does not support the use of TSAT as an alternative to ferritin. Transferrin, TSAT, and sTfR each alone had some diagnostic significance, but the diagnostic accuracies were too low to support the use of any of those alone in a patient population with hematologic diseases. These performances are not as good as previously reported in apparently healthy populations, which could be caused by the underlying hematologic diseases in our study population.6,17,20
In this study, hemoglobin concentration alone had a poor diagnostic accuracy in detecting reduced bone marrow iron even though iron deficiency is the most common cause of anemia.1 Unexpectedly, lower hemoglobin was more common in patients with full iron stores (Figure 2D). This probably reflects the underlying hematologic diseases of the patient population and the selection criteria for bone marrow aspiration. In addition, patients with hematologic disease–derived anemia may receive repeated blood transfusions, which can result in excessive iron stores. However, in the general population, iron deficiency is the most common cause of anemia, and low hemoglobin is frequently indicative of reduced iron stores.2
MCV showed good specificity but low sensitivity in diagnosing reduced iron stores. Thus, MCV is a useful tool for clinicians to strengthen suspicion of iron deficiency, but a normal or high MCV does not rule out iron deficiency. In our patient population, underlying bone marrow diseases can increase the MCV and thus reduce its utility. In the general population, MCV is more accurate.8 However, the low sensitivity of MCV is an issue even in more general populations: in the older age, most patients with IDA can have normal MCVs.21
The diagnostic efficacy of a test is often measured by its sensitivity, specificity, ROC curve, and AUC. However, in clinical practice, the PPV and NPV are the most relevant markers. PPV and NPV can be calculated for a cutoff with known sensitivity and specificity if the preprobability is known. In a clinical setting, the preprobability is estimated from the patient’s background information (eg, age, sex, medication, chronic medical conditions), symptoms, clinical tests, and other diagnostic tests. For example, an 18-year-old female in the United States has ∼35%22 preprobability of iron deficiency, and if she is known to have heavy menstrual bleeding and fatigue, the preprobability is even higher. To illustrate the preprobability’s significance, we presented PPV and NPV in a novel way in a figure as a continuous function of different preprobabilities (Figure 5; shown in linear scale in supplemental Figure 2; supplemental Table 9). We show that ruling in iron deficiency with 95% PPV is feasible at least for ferritin. However, ruling out iron deficiency with 95% NPV is more difficult using any of the studied laboratory parameters: typically, a result way beyond the reference limits would be required, and thus, these cutoffs are not useful for clinical practice. The diagnostic gray area even for the best-performing biomarker, ferritin, is markedly wide (Figure 5), and the same outcome is clearer in other biomarkers (supplemental Figures 3-7; supplemental Table 9).
Thus, we show that a single biomarker concentration cutoff cannot be used in all patients to rule in or rule out iron deficiency with confidence. In addition to patient-specific factors, clinical laboratories use different commercial ferritin assays that can provide slightly variable ferritin concentration results.10 In many cases, exact data on this variation may not exist. This analytical variation should also be discussed when implementing results from published studies to clinical practice.
We performed a simple multivariate analysis that did not yield better prediction than ferritin alone. This was true even though we did not divide the patients into training and validation groups, and thus, the models could have bias for better performance.23 This suggests that ferritin alone is often useful, and other biomarkers do not add significant diagnostic value if ferritin is available.
The data are lacking on whether even low (but stainable) bone marrow iron stores are sufficient to prevent IDA or whether IDA will develop only when bone marrow iron is totally absent. In previous studies comparing iron biomarkers with bone marrow iron, some studies use absent staining whereas others use low staining as a cutoff.7,24 Because the effects of iron deficiency start before the iron stores are completely used,6 our main analyses focus on diagnosing patients with reduced bone marrow iron (none or low bone marrow iron). However, we also obtained results showing similar but slightly higher AUCs for the studied iron biomarkers if they are used to distinguish no bone marrow iron from normal and abundant iron stores (supplemental Tables 10 and 11). Given that the relative differences between the biomarkers and the models did not change significantly, this analysis does not change our conclusions.
Our study has some limitations to consider. The bone marrow samples were taken based on clinical need; thus, the patient population is highly selected, and on average older, and results cannot be generalized to the healthier or younger population. Our study population consisted of heterogenic patients with hematologic disorders. However, especially in an inpatient setting, our data provide important real-life evidence. We provide data on iron storage biomarkers in a large sample population that is relevant for clinical decision making. For previously healthy individuals, diagnosing iron deficiency is usually relatively straightforward. In hospitalized patients and those with serious chronic illnesses, determining whether anemia is caused by iron deficiency is more difficult.
Another limitation was that the diagnoses and IV iron supplement were collected from electronic health records, which may not be fully reliable because there may be data missing. Moreover, data on oral iron supplementation were lacking and thus could not be accounted for. However, it is unlikely that the missing data would alter our results, given that bone marrow samples are rarely taken if the patient is obviously iron deficient and receives iron supplementation. Another potential limitation with bone marrow stains is staining artifact and mischaracterization of the iron stores, given that the interpretation of iron stains is always somewhat subjective. Some subjectivity was also observed in the analysis of the variation in interpretations (supplemental Table 1). However, the variation was not time related but random, and thus, its effect on this large data set does not alter the conclusions. In addition, choosing primarily samples with no or low iron may have increased the proportion of erroneous aspirates. However, our large sample material will likely ameliorate these sources of mischaracterization, and given that our main analyses focused on the diagnosis of reduced iron stores (none or low bone marrow iron), differentiating between these 2 categories would not affect the main findings.
Our study shows that ferritin is clearly the best single biomarker to estimate bone marrow iron storage situation in an inpatient setting with older patients. However, there is a wide diagnostic gray area in which the clinician cannot rule out or rule in iron deficiency with 95% probability. Combining multiple biomarkers in multivariate analysis did not give a significant improvement to iron store prediction compared with prediction based on ferritin alone. More research is needed to find better biomarkers for iron storage, as well as diagnostic cutoffs for ferritin in common conditions.
Acknowledgments
The authors acknowledge Sanna Siitonen for her assistance in interpreting the verification samples, HUS Data Lake and Data Production for providing with the patient data, and HUS Acamedic for providing with the secure operating environment.
This work was supported by funding from HUS Diagnostic Center in 2022-2024 (T.L. and L.J.-K.) and Blood Disease Research Foundation (P.S.). Open access was funded by the Helsinki University Library.
T.L. is a PhD candidate at the Helsinki University. This work is submitted in partial fulfillment of the requirement for the PhD.
All the sponsors of this study are public or nonprofit organizations.
Authorship
Contribution: T.L., P.S., T.H., and L.J.-K. designed the study; T.L. performed the data mining and statistical analyses and made the figures; P.S. and A.L. interpreted the verification samples; T.L., P.S., A.L., T.H., and L.J.-K. wrote the manuscript; and T.H., and L.J.-K. supervised the research.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Tapio Lahtiharju, HUS Diagnostic Center, Topeliuksenkatu 32, 00029 HUS, Helsinki, Finland; email: tapio.lahtiharju@hus.fi.
References
Author notes
Presented as poster at the 25th International Congress of Clinical Chemistry and Laboratory Medicine, Rome, Italy, 21 May 2023, and at the 39th Nordic Congress in Clinical Chemistry, Stockholm, Sweden, 17 to 20 September 2024.
Data are available on reasonable request from the corresponding author, Tapio Lahtiharju (tapio.lahtiharju@hus.fi). The data from this study cannot be made public because the data are registry based with restricted access.
The full-text version of this article contains a data supplement.