Leukemia stem cells (LSC) are a rare population in leukemia and are thought to cause relapse after conventional treatment. Thus, there is an urgent need to understand the underlying mechanism and driving genes in LSCs in order to improve clinical outcomes. However, in acute lymphoblastic leukemia (ALL), due to lack of consistent models and surface markers, the studies on LSC in ALL have not been able to draw a generalizable conclusion.
Method: We searched the GEO database and identified datasets which can be used for integrative transcriptomics analysis. The selected dataset was pre-processed with a standard pipeline (RNAsik) and analysed with edgeR. Further, functional annotation was performed to obtain interpretable enriched pathways. Genes from species other than human were converted to human homologs using biomaRt package in order to find shared gene signature.
Results: Two datasets with comparable experimental design (slow cycling cell representing LSCs and fast cycling cells representing non-LSCs) were found and used in this analysis. Although they are from different species and cell types, their expression profiles significantly correlate. Comparing LSCs to non-LSCs, there are 15 differential expressed genes (DEGs) and 12 pathways which are shared by two datasets. Moreover, we found loss of information in shared DEGs due to non-homolog genes, such as human HLA family and mouse H2 family which share a similar function but are not homologs to each other. However, pathways related to these non-homolog genes, such as immune response related pathways, were recovered by pathway enrichment analysis.
Conclusion: By integration of multiple datasets, dataset-specific biases can be alleviated and reveals a biological meaningful 15-gene LSC signature. However, by only comparing genes with homologs, which is the common practice, certain genes will be overlooked. Thus, complementary pathway enrichment analysis is helpful and discovers significantly altered immune-related genes and pathways.