Poster Presentation 40th Annual Lorne Genome Conference 2019

Cell type prediction at single-cell resolution (#107)

Jose Alquicira 1 2 , Anuja Sathe 3 4 , Hanlee Ji 3 4 , Quan Nguyen 1 , Joseph Powell 2 5 6
  1. Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
  2. Garvan Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
  3. Department of Medicine, Stanford University School of Medicine, Stanford, California, USA
  4. Stanford Genome Technology Center, Stanford University, Stanford, California, USA
  5. Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
  6. Faculty of Medicine, University of New South Wales, Sydney, New South Wales, Australia

Single-cell RNA sequencing has enabled the characterization of highly specific cell types in many human tissues, as well as both primary and stem cell-derived cell lines. An important facet of these studies is the ability to identify the transcriptional signatures that define a cell type or state. In theory, this information can be used to classify an unknown cell based on its transcriptional profile; and clearly, the ability to accurately predict a cell type and any pathologic-related state will play a critical role in the early diagnosis of disease and decisions around the personalized treatment for patients. Here we present a new generalizable method (scPred) for prediction of cell type(s), using a combination of unbiased feature selection from a reduced-dimension space, and machine-learning classification. scPred solves several problems associated with the identification of individual gene feature selection, and is able to capture subtle effects of many genes, increasing the overall variance explained by the model, and correspondingly improving the prediction accuracy. We validate the performance of scPred by performing experiments to classify tumor versus non-tumor epithelial cells in gastric cancer, then using independent molecular techniques (cyclic immunohistochemistry) to confirm our prediction, achieving an accuracy of classifying the disease state of individual cells of 99%. Moreover, we apply scPred to scRNA-seq data from pancreatic tissue, colorectal tumor biopsies, and circulating dendritic cells, and show that scPred is able to classify cell subtypes with an accuracy of 96.1-99.2%. Collectively, our results demonstrate the utility of scPred as a single cell prediction method that can be used for a wide variety of applications. The generalized method is implemented in software available here: https://github.com/IMB-ComputationalGenomics-Lab/scPred/