6-color flow cytometry allows multiparameter analysis of high numbers of single cells. It is an excellent tool for the characterization of a wide range of hematopoietic populations and for monitoring minimal residual disease. However, analysis of complex flow data is challenging. Gating populations on 28 two-parameter plots is extremely tedious and does not reflect the multidimensionality of the data. Here, we describe a novel approach, employing hierarchical clustering (HCA) and support vector machine (SVM) learning in analyzing flow data. This approach provides a new perspective for looking at flow data and promises better identification of rare and novel subpopulations that escape classic analysis. Our aim was to identify normal and leukemic B cell progenitor/stem cell populations in normal (n=6) and ALL (n=10) bone marrow. Samples were labelled with fluorochrome-conjugated antibodies to 6 CD markers (CD 10, 19, 22, 34, 38, 117) and 104 to 106 events were acquired (FACSCanto, BD Biosciences). To analyze flow data with HCA we developed a new algorithm, better suited for the ellipsoid nature of cell populations than other current HCA metrics. Data exported from DiVa software were externally compensated and Hyperlog transformed to achieve a logarithmic-like scale that displayed zero and negative values. Normalized data were then subjected to HCA employing a scale-invariant Mahalanobis distance measurement for merging clusters. This reflects the extended ellipsoid shape of the populations (here: 8 dimensional ellipsoids). We developed a new adaptive linkage algorithm that smoothly shifts from the Euclidean distance (when clusters are too small to compute Mahalanobis distance) to Mahalanobis distance measurement. This allowed us to build the hierarchy from single events, yet to retain the advantage of Mahalanobis measurement for larger clusters. To build classifiers we used SVM employing polynomial kernel. All work was carried out in MATLAB (MathWorks, Inc.). The resulting hierarchical tree combined with the heatmap of the CD marker expression allows visualization of hierarchically clustered data with all 8 parameters displayed in a single plot (!) as compared to 28 traditional two-parameter plots. HCA has big advantage of providing populations homogenous in their expression pattern of all parameters (without the need for complex sub or back gating). We were able to identify populations corresponding to the different stages of B-cell development. In a normal control bone marrow we could detect the following candidate B-lineage progenitor populations: CD34+117+38+102219 (0.94% of total) progenitor/stem cells, CD34+11738+10+22+19med (0.26% of total) pro-B cells, CD3411738+10+22+19+ (2.77% of total) small pre-B cells (lower FCS values), CD3411738+10+22+19+ (1.09% of total) large pre-B cells (higher FCS values) and CD3411738lo1022+19+ (5.94% of total) (immature) B cells. In 10 diagnostic or relapse samples HCA clearly identified the main leukemic population. HCA is able to visualize otherwise “hidden” populations. This was exemplified by a distinct CD38+B-lin population that overlapped with other populations in all 28 two-parameter plots (most likely T cells). We have built a classifier able to find established populations across samples and in large datasets (106 events) for which HCA would be computationally too demanding. In summary, we show the advantages of using hierarchical clustering analysis for large complex multiparameter flow cytometry datasets.

Author notes

Disclosure: No relevant conflicts of interest to declare.

Sign in via your Institution