Introduction
Single cell RNA sequencing (scRNAseq) is commonly used to determine cell identity. Several scRNAseq analysis methods are available. Most methods use arbitrary scores to represent the similarity of the query cell to a reference profile and do not assess the statistical significance of those scores. We previously presented a method to identify cells that express a profile of interest (the reference profile) within a scRNAseq dataset. Our method, called Single Cell Correlation Analysis (SCA), identifies cells in a query dataset that are similar to a reference profile. SCA represents a novel advance in the field because it calculates the statistical significance of the correlation coefficient using a permutation-based false discovery rate (FDR) estimation. Here, we advanced SCA to uniformly identify cells across multiple scRNAseq datasets. The new method is called SCA-Across Datasets (SCA-AD). We tested the performance of SCA-AD in validated scRNAseq datasets of normal murine and human bone marrow, benchmarked SCA-AD to other methods, and used SCA-AD to identify and compare self-renewing cells in human AML samples from adult and pediatric patients.
Methods and Results
SCA-AD uses a universal scoring system to assign cell identity across datasets. Spearman's correlation is used to score the similarity of query cells to a reference profile. SCA-AD integrates a common background dataset into each query dataset which ensures heterogeneity in the data and allows SCA-AD to establish a threshold value for cell identity across the datasets. FDR calculations are used to establish the threshold value.
We tested SCA-AD in validated scRNAseq datasets of normal murine and human bone marrow. In both cases, the self-renewing compartment was experimentally-validated using gold-standard, in vivo self-renewal assays. We extracted a reference normal bone marrow self-renewal profile from the in vivo-validated murine scRNAseq dataset (Rodriguez-Fraticelli et al. Nature 2020). We used this reference profile to query a human scRNAseq dataset of normal bone marrow precursors with experimentally-validated self-renewal capacity (Velten et al. Nature Cell Biology, 2017). We compared the performance of SCA-AD to other commonly-used methods: scmap-cluster (Kiselev et al. Nature Methods 2018), singleR (Aran et al. Nature Immunology 2019), scType (Ianevski et al. Nature Communications 2022), AUCell (Aibar et al. Nature Methods 2017). SCA-AD matched or exceeded the sensitivity and precision of each of these methods when applied to the Velten dataset. We found that, at a false discovery rate (FDR) of 0.001, SCA-AD identified self-renewing cells with 63% sensitivity and 27% precision. All but the scmap-cluster method matched SCA-AD in sensitivity and all the methods displayed a significantly lower precision (11-19%).
To ensure that SCA-AD can determine if the cell of interest is absent in the data, we removed all experimentally-validated self-renewing cells from the Velten dataset. All the other methods identified 27-100% self-renewing cells in the case when the self-renewing cells were omitted from the data. In contrast, SCA-AD identified 5.9% self-renewing cells with an FDR of 0.001, demonstrating a very low false positive rate and the high specificity of SCA-AD.
Finally, we applied SCA-AD to scRNAseq datasets of AML from 16 adult (van Galen et al. Cell, 2019) and 13 pediatric (Zhang et al. Genome Biology, 2023) patients. For a reference profile, we used the leukemia self-renewal profile that we defined previously and validated in vivo (Sachs et al. Cancer Research 2020). SCA-AD identified self-renewing cells in each of the 29 AML samples. In adults, the self-renewing cells were most prevalent in GMPs and progenitors. In contrast, the self-renewing cells in the pediatric samples were more prevalent among GMPs and LMPPs, suggesting differences in the stem cell compartment between pediatric and adult AML. Pathway analysis revealed enrichment of Myc and mitochondrial profiles in the self-renewing cells.
Conclusions
We developed SCA-AD, a novel method to uniformly identify cells across datasets with a statistical confidence measure. SCA-AD performs well in experimentally-validated datasets, can determine if a cell of interest is absent in the data, and matches or exceeds the performance of existing methods. Finally, we used SCA-AD to compare features of self-renewing cells in adult and pediatric AML.
No relevant conflicts of interest to declare.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal