Background
The analysis of the transcriptome has significantly contributed to our understanding of the processes involved in disease and development, but the heterogeneous nature of samples and tissues under investigation has been largely neglected. Multiple computational approaches have been developed to infer abundance of different cell types in heterogeneous samples (=computational deconvolution) [1]. Albeit potentially applicable to different RNA fractions, the available methods have been designed and tested on protein coding genes (mRNAs) only. Using expression data of known and novel long non-coding RNAs (lncRNAs), circular RNAs (circRNAs), microRNAs (miRNAs) and mRNAs from RNA-sequencing data across 160 different normal cell types and 45 tissues from the RNA Atlas project [2], we investigated the performance of additional RNA fractions in the computational deconvolution workflow.
Results
Tissues and cell types in the RNA-Atlas dataset were matched based on UBERON ontology. For each cell type, we defined cell-type specific markers based on matching mRNA, lncRNA, miRNA and circRNA expression data. These markers were subsequently applied to determine the proportion of each cell type in each of the tissues through computational deconvolution. For any given tissue, we defined the “signal” as the sum of the proportions of all its constituent cell types. This signal was computed for mRNA, miRNA, lncRNA and circRNA markers separately.
Conclusions
We found that mRNAs contained the highest amount of biological signal across tissues, closely followed by lncRNAs. Furthermore, despite having lower overall performance, both miRNAs and circRNAs can deconvolve specific tissues with higher accuracy than mRNAs and lncRNAs.
References