Clinical and laboratory characteristics of the training and validation data sets
. | Training data set (NIH) . | Validation data set (USP) . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
All . | Labels . | Clustering . | All . | Labels . | Clustering . | |||||
Acquired . | Inherited . | Cluster A . | Cluster B . | Acquired . | Inherited . | Cluster A . | Cluster B . | |||
No. of patients (%) | 359 (100) | 232 (65) | 127 (35) | 300 (84) | 59 (16) | 127 (100) | 92 (72) | 35 (28) | 110 (87) | 17 (13) |
Labels (%) | ||||||||||
Inherited | 127 (35.3) | 90 (30) | 37 (63) | 35 (27.5) | 29 (26) | 6 (25) | ||||
Acquired | 232 (64.6) | 210 (70) | 22 (37) | 92 (72.4) | 81 (74) | 11 (65) | ||||
Sex (%) | ||||||||||
Female | 174 (48) | 119 (51) | 55 (43) | 140 (47) | 34 (58) | 58 (46) | 49 (53) | 9 (26) | 47 (43) | 11 (65) |
Male | 185 (52) | 113 (49) | 72 (57) | 160 (53) | 25 (42) | 69 (54) | 43 (47) | 26 (74) | 63 (57) | 6 (35) |
Median age (range), y | 28 (1-86) | 34 (3-86) | 17 (1-61) | 29 (1-86) | 24 (3-66) | 23 (1-83) | 27 (1-82) | 15 (1-52) | 24 (1-82) | 10 (1-49) |
Laboratory counts (mean ± SD) | ||||||||||
Red blood cell counts (103/dL) | 3.06 ± 0.78 | 2.90 ± 0.75 | 3.36 ± 0.78 | 2.9 ± 0.7 | 4.93 ± 0.59 | 2.83 ± 0.97 | 2.7 ± 0.98 | 3.1 ± 0.9 | 2.7 ± 0.8 | 3.9 ± 1.2 |
Hemoglobin (g/dL) | 9.92 ± 2.29 | 9.2 ± 2.05 | 11.2 ± 2.2 | 9.4 ± 2.1 | 12.3 ± 1.5 | 8.96 ± 2.70 | 8.6 ± 2.7 | 9.9 ± 2.6 | 8.7 ± 2.5 | 10.8 ± 3.0 |
Mean corpuscular volume (mean ± SD) | 94 ± 11 | 93 ± 11 | 98 ± 11 | 95 ± 11 | 92 ± 10 | 96 ± 12 | 96 ± 12 | 98 ± 12 | 98 ± 11 | 87 ± 12 |
Platelets (103/dL) | 63 ± 76 | 47 ± 74 | 92 ± 72 | 35 ± 30 | 206 ± 78 | 58 ± 84 | 53 ± 81 | 71 ± 91 | 29 ± 26 | 249 ± 79 |
Neutrophils (103/dL) | 1.1 ± 1.1 | 0.9 ± 1.1 | 1.5 ± 1.1 | 1.0 ± 1 | 1.63 ± 1.3 | 1 ± 0.86 | 0.9 ± 0.8 | 1.2 ± 1.1 | 1.0 ± 0.9 | 0.9 ± 0.7 |
Red cell distribution width | 15 ± 3 | 15.7 ± 3.2 | 15 ± 2.9 | 16 ± 3 | 13 ± 1.5 | 16 ± 3 | 16 ± 3.1 | 16 ± 2.9 | 16 ± 3 | 15 ± 2.4 |
Lymphocytes (103/dL) | 1.48 ± 0.83 | 1.5 ± 0.8 | 1.5 ± 0.9 | 1.46 ± 0.8 | 1.6 ± 0.8 | 1.6 ± 1.1 | 1.6 ± 1.2 | 1.6 ± 0.7 | 1.46 ± 0.9 | 2.6 ± 1.9 |
Monocytes (103/dL) | 0.21 ± 0.2 | 0.16 ± 0.16 | 0.32 ± 0.22 | 0.19 ± 0.19 | 0.35 ± 0.2 | 0.2 ± 0.16 | 0.2 ± 0.17 | 0.2 ± 0.14 | 0.2 ± 0.15 | 0.3 ± 0.17 |
Eosinophils (103/dL) | 0.05 ± 0.1 | 0.03 ± 0.06 | 0.07 ± 0.1 | 0.03 ± 0.08 | 0.12 ± 0.1 | 0.05 ± 0.12 | 0.05 ± 0.13 | 0.05 ± 0.08 | 0.04 ± 0.12 | 0.1 ± 0.13 |
Basophils (103/dL) | 0.01 ± 0.03 | 0.008 ± 0.02 | 0.02 ± 0.04 | 0.01 ± 0.02 | 0.027 ± 0.05 | 0.01 ± 0.03 | 0.01 ± 0.03 | 0.006 ± 0.02 | 0.01 ± 0.03 | 0.02 ± 0.04 |
Reticulocytes (103/dL) | 44.1 ± 28.2 | 38 ± 27 | 55 ± 27 | 43 ± 29 | 48 ± 24 | 55.1 ± 38 | 52 ± 36 | 63 ± 42 | 53 ± 38 | 71 ± 33 |
Presence of PNH clones, n (%)∗ | 63 (17) | 62 (27) | 1 (0.7) | 61 (20) | 2 (3) | 15 (12) | 14 (15) | 1 (3) | 15 (14) | 0 |
(n, % missing values) | 38 (11) | 11 (5) | 27 (21) | 28 (10) | 10 (17) | 11 (9) | 6 (6) | 2 (6) | 5 (4) | 3 (18) |
Abnormal karyotype, n (%)∗ | 41 (11) | 21 (9) | 20 (16) | 30 (10) | 11 (19) | 9 (7) | 8 (9) | 1 (3) | 7 (6) | 2 (12) |
Complex or monosomy 7, n (%) | 13 (4) | 7 (3) | 6 (5) | 11 (4) | 2 (3) | 3 (2) | 2 (2) | 1 (3) | 3 (3) | 0 |
(n, % missing values) | 18 (5) | 4 (2) | 14 (11) | 16 (5) | 2 (3) | 77 (60) | 46 (50) | 23 (65) | 61 (55) | 8 (47) |
Telomere length, n (%) | ||||||||||
Normal | 192 (53) | 164 (71) | 28 (22) | 159 (53) | 33 (56) | 79 (62) | 68 (74) | 11 (31) | 67 (61) | 12 (71) |
<10th percentile | 56 (16) | 39 (17) | 17 (13) | 48 (16) | 8 (14) | 13 (10) | 11 (12) | 2 (6) | 13 (12) | 0 |
<First percentile | 111 (31) | 29 (13) | 82 (65) | 93 (31) | 18 (31) | 35 (28) | 13 (14) | 22 (63) | 30 (27) | 5 (29) |
Bone marrow cellularity for age, n (%) | ||||||||||
Hypocellular | 331 (92.2) | 218 (94) | 113 (90) | 284 (94.7) | 47 (80) | 111 (87) | 79 (86) | 32 (91) | 103 (94) | 8 (47) |
Normocellular | 24 (6.7) | 10 (4) | 14 (11) | 14 (4.7) | 10 (17) | 16 (13) | 13 (14) | 3 (9) | 7 (6) | 9 (53) |
Hypercellular | 4 (1.1) | 4 (2) | 0 | 2 (0.7) | 2 (3) | 0 | 0 | 0 | 0 | 0 |
Dysplasia or increased blasts in bone marrow biopsy, n (%) | 19 (5) | 9 (4) | 10 (8) | 16 (5) | 3 (5) | 16 (13) | 9 (10) | 7 (20) | 13 (12) | 3 (18) |
Clinical data, n (%) | ||||||||||
Presence of DC clinical triad | 32 (9) | 4 (2) | 28 (22) | 28 (9) | 4 (7) | 17 (13) | 4 (4) | 13 (37) | 15 (14) | 2 (12) |
Presence of abnormal cutaneous findings | 44 (12) | 7 (3) | 37 (29) | 27 (9) | 17 (29) | 5 (4) | 2 (2) | 3 (9) | 4 (4) | 1 (6) |
Presence of physical anomalies | 72 (20) | 12 (5) | 60 (47) | 41 (14) | 31 (53) | 4 (3) | 1 (1) | 3 (9) | 2 (2) | 2 (12) |
Presence of multiorgan diseases | 87 (24) | 29 (13) | 58 (46) | 66 (22) | 21 (36) | 15 (12) | 3 (3) | 12 (34) | 12 (11) | 3 (18) |
Long-standing cytopenias or macrocytosis | 66 (30) | 11 (5) | 55 (43) | 44 (15) | 22 (37) | 5 (4) | 1 (1) | 4 (11) | 4 (4) | 1 (6) |
Long-standing history of recurrent bleeding and infections | 109 (6) | 47 (20) | 62 (49) | 82 (27) | 27 (46) | 6 (5) | 5 (5) | 1 (3) | 5 (5) | 1 (6) |
Immunodeficiency | 20 (6) | 7 (3) | 13 (10) | 9 (3) | 11 (19) | 1 (1) | 0 | 1 (3) | 1 (1) | 0 |
Proband with early gray hair | 20 (6) | 9 (4) | 11 (9) | 13 (4) | 7 (12) | 2 (2) | 1 (1) | 1 (3) | 2 (2) | 0 |
Immediate family members with similar phenotype | 23 (24) | 9 (4) | 14 (11) | 21 (7) | 2 (3) | 0 | 0 | 0 | 0 | 0 |
Extended family members with similar phenotype | 60 (17) | 31 (13) | 29 (23) | 50 (17) | 10 (17) | 3 (2) | 3 (3) | 0 | 3 (3) | 0 |
Relatives with early gray hair | 32 (9) | 19 (8) | 13 (10) | 28 (9) | 4 (7) | 0 | 0 | 0 | 0 | 0 |
. | Training data set (NIH) . | Validation data set (USP) . | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
All . | Labels . | Clustering . | All . | Labels . | Clustering . | |||||
Acquired . | Inherited . | Cluster A . | Cluster B . | Acquired . | Inherited . | Cluster A . | Cluster B . | |||
No. of patients (%) | 359 (100) | 232 (65) | 127 (35) | 300 (84) | 59 (16) | 127 (100) | 92 (72) | 35 (28) | 110 (87) | 17 (13) |
Labels (%) | ||||||||||
Inherited | 127 (35.3) | 90 (30) | 37 (63) | 35 (27.5) | 29 (26) | 6 (25) | ||||
Acquired | 232 (64.6) | 210 (70) | 22 (37) | 92 (72.4) | 81 (74) | 11 (65) | ||||
Sex (%) | ||||||||||
Female | 174 (48) | 119 (51) | 55 (43) | 140 (47) | 34 (58) | 58 (46) | 49 (53) | 9 (26) | 47 (43) | 11 (65) |
Male | 185 (52) | 113 (49) | 72 (57) | 160 (53) | 25 (42) | 69 (54) | 43 (47) | 26 (74) | 63 (57) | 6 (35) |
Median age (range), y | 28 (1-86) | 34 (3-86) | 17 (1-61) | 29 (1-86) | 24 (3-66) | 23 (1-83) | 27 (1-82) | 15 (1-52) | 24 (1-82) | 10 (1-49) |
Laboratory counts (mean ± SD) | ||||||||||
Red blood cell counts (103/dL) | 3.06 ± 0.78 | 2.90 ± 0.75 | 3.36 ± 0.78 | 2.9 ± 0.7 | 4.93 ± 0.59 | 2.83 ± 0.97 | 2.7 ± 0.98 | 3.1 ± 0.9 | 2.7 ± 0.8 | 3.9 ± 1.2 |
Hemoglobin (g/dL) | 9.92 ± 2.29 | 9.2 ± 2.05 | 11.2 ± 2.2 | 9.4 ± 2.1 | 12.3 ± 1.5 | 8.96 ± 2.70 | 8.6 ± 2.7 | 9.9 ± 2.6 | 8.7 ± 2.5 | 10.8 ± 3.0 |
Mean corpuscular volume (mean ± SD) | 94 ± 11 | 93 ± 11 | 98 ± 11 | 95 ± 11 | 92 ± 10 | 96 ± 12 | 96 ± 12 | 98 ± 12 | 98 ± 11 | 87 ± 12 |
Platelets (103/dL) | 63 ± 76 | 47 ± 74 | 92 ± 72 | 35 ± 30 | 206 ± 78 | 58 ± 84 | 53 ± 81 | 71 ± 91 | 29 ± 26 | 249 ± 79 |
Neutrophils (103/dL) | 1.1 ± 1.1 | 0.9 ± 1.1 | 1.5 ± 1.1 | 1.0 ± 1 | 1.63 ± 1.3 | 1 ± 0.86 | 0.9 ± 0.8 | 1.2 ± 1.1 | 1.0 ± 0.9 | 0.9 ± 0.7 |
Red cell distribution width | 15 ± 3 | 15.7 ± 3.2 | 15 ± 2.9 | 16 ± 3 | 13 ± 1.5 | 16 ± 3 | 16 ± 3.1 | 16 ± 2.9 | 16 ± 3 | 15 ± 2.4 |
Lymphocytes (103/dL) | 1.48 ± 0.83 | 1.5 ± 0.8 | 1.5 ± 0.9 | 1.46 ± 0.8 | 1.6 ± 0.8 | 1.6 ± 1.1 | 1.6 ± 1.2 | 1.6 ± 0.7 | 1.46 ± 0.9 | 2.6 ± 1.9 |
Monocytes (103/dL) | 0.21 ± 0.2 | 0.16 ± 0.16 | 0.32 ± 0.22 | 0.19 ± 0.19 | 0.35 ± 0.2 | 0.2 ± 0.16 | 0.2 ± 0.17 | 0.2 ± 0.14 | 0.2 ± 0.15 | 0.3 ± 0.17 |
Eosinophils (103/dL) | 0.05 ± 0.1 | 0.03 ± 0.06 | 0.07 ± 0.1 | 0.03 ± 0.08 | 0.12 ± 0.1 | 0.05 ± 0.12 | 0.05 ± 0.13 | 0.05 ± 0.08 | 0.04 ± 0.12 | 0.1 ± 0.13 |
Basophils (103/dL) | 0.01 ± 0.03 | 0.008 ± 0.02 | 0.02 ± 0.04 | 0.01 ± 0.02 | 0.027 ± 0.05 | 0.01 ± 0.03 | 0.01 ± 0.03 | 0.006 ± 0.02 | 0.01 ± 0.03 | 0.02 ± 0.04 |
Reticulocytes (103/dL) | 44.1 ± 28.2 | 38 ± 27 | 55 ± 27 | 43 ± 29 | 48 ± 24 | 55.1 ± 38 | 52 ± 36 | 63 ± 42 | 53 ± 38 | 71 ± 33 |
Presence of PNH clones, n (%)∗ | 63 (17) | 62 (27) | 1 (0.7) | 61 (20) | 2 (3) | 15 (12) | 14 (15) | 1 (3) | 15 (14) | 0 |
(n, % missing values) | 38 (11) | 11 (5) | 27 (21) | 28 (10) | 10 (17) | 11 (9) | 6 (6) | 2 (6) | 5 (4) | 3 (18) |
Abnormal karyotype, n (%)∗ | 41 (11) | 21 (9) | 20 (16) | 30 (10) | 11 (19) | 9 (7) | 8 (9) | 1 (3) | 7 (6) | 2 (12) |
Complex or monosomy 7, n (%) | 13 (4) | 7 (3) | 6 (5) | 11 (4) | 2 (3) | 3 (2) | 2 (2) | 1 (3) | 3 (3) | 0 |
(n, % missing values) | 18 (5) | 4 (2) | 14 (11) | 16 (5) | 2 (3) | 77 (60) | 46 (50) | 23 (65) | 61 (55) | 8 (47) |
Telomere length, n (%) | ||||||||||
Normal | 192 (53) | 164 (71) | 28 (22) | 159 (53) | 33 (56) | 79 (62) | 68 (74) | 11 (31) | 67 (61) | 12 (71) |
<10th percentile | 56 (16) | 39 (17) | 17 (13) | 48 (16) | 8 (14) | 13 (10) | 11 (12) | 2 (6) | 13 (12) | 0 |
<First percentile | 111 (31) | 29 (13) | 82 (65) | 93 (31) | 18 (31) | 35 (28) | 13 (14) | 22 (63) | 30 (27) | 5 (29) |
Bone marrow cellularity for age, n (%) | ||||||||||
Hypocellular | 331 (92.2) | 218 (94) | 113 (90) | 284 (94.7) | 47 (80) | 111 (87) | 79 (86) | 32 (91) | 103 (94) | 8 (47) |
Normocellular | 24 (6.7) | 10 (4) | 14 (11) | 14 (4.7) | 10 (17) | 16 (13) | 13 (14) | 3 (9) | 7 (6) | 9 (53) |
Hypercellular | 4 (1.1) | 4 (2) | 0 | 2 (0.7) | 2 (3) | 0 | 0 | 0 | 0 | 0 |
Dysplasia or increased blasts in bone marrow biopsy, n (%) | 19 (5) | 9 (4) | 10 (8) | 16 (5) | 3 (5) | 16 (13) | 9 (10) | 7 (20) | 13 (12) | 3 (18) |
Clinical data, n (%) | ||||||||||
Presence of DC clinical triad | 32 (9) | 4 (2) | 28 (22) | 28 (9) | 4 (7) | 17 (13) | 4 (4) | 13 (37) | 15 (14) | 2 (12) |
Presence of abnormal cutaneous findings | 44 (12) | 7 (3) | 37 (29) | 27 (9) | 17 (29) | 5 (4) | 2 (2) | 3 (9) | 4 (4) | 1 (6) |
Presence of physical anomalies | 72 (20) | 12 (5) | 60 (47) | 41 (14) | 31 (53) | 4 (3) | 1 (1) | 3 (9) | 2 (2) | 2 (12) |
Presence of multiorgan diseases | 87 (24) | 29 (13) | 58 (46) | 66 (22) | 21 (36) | 15 (12) | 3 (3) | 12 (34) | 12 (11) | 3 (18) |
Long-standing cytopenias or macrocytosis | 66 (30) | 11 (5) | 55 (43) | 44 (15) | 22 (37) | 5 (4) | 1 (1) | 4 (11) | 4 (4) | 1 (6) |
Long-standing history of recurrent bleeding and infections | 109 (6) | 47 (20) | 62 (49) | 82 (27) | 27 (46) | 6 (5) | 5 (5) | 1 (3) | 5 (5) | 1 (6) |
Immunodeficiency | 20 (6) | 7 (3) | 13 (10) | 9 (3) | 11 (19) | 1 (1) | 0 | 1 (3) | 1 (1) | 0 |
Proband with early gray hair | 20 (6) | 9 (4) | 11 (9) | 13 (4) | 7 (12) | 2 (2) | 1 (1) | 1 (3) | 2 (2) | 0 |
Immediate family members with similar phenotype | 23 (24) | 9 (4) | 14 (11) | 21 (7) | 2 (3) | 0 | 0 | 0 | 0 | 0 |
Extended family members with similar phenotype | 60 (17) | 31 (13) | 29 (23) | 50 (17) | 10 (17) | 3 (2) | 3 (3) | 0 | 3 (3) | 0 |
Relatives with early gray hair | 32 (9) | 19 (8) | 13 (10) | 28 (9) | 4 (7) | 0 | 0 | 0 | 0 | 0 |
DC triad is defined by at least 2 of the following: nail dystrophy, skin hyper/hypopigmentation, and leukoplasia.
SD, standard deviation.
Variables excluded from analysis because of a high number of missing values in cases labeled as inherited or in the validation cohort.