Disease states generally present specific RNA transcripts that do not exist in normal cells. These transcript isoforms may not exist in the reference annotation, and short-read sequencing data may not recover them accurately either due to the limitation of the read length or to the lack of an appropriate genome reference. Long-read sequencing technologies offer the potential to obtain the actual transcriptome operating in cells. However, accurate analysis of long-reads remains challenging due to error rates and the fact that transcript splicing variants differ from each other by short sequence stretches; hence, new analysis methods are needed.
We have developed RATTLE, a new method for the reference-free reconstruction of transcriptomes from Nanopore sequencing reads. RATTLE uses a new k-mer based similarity measure to cluster reads and quantify transcripts, performs error correction, and delineates alternative transcript isoforms, from long reads without the need of a genome reference. Using experimental and simulated data, we show that RATTLE outperforms other methods at clustering, and it achieves a detection of known splice-sites similar to reference-based methods. Additionally, read correction with RATTLE improves the proportion of reads mapped to the genome, and recovers accurately gene and transcript abundances. Direct quantification of transcriptomes coupled to our tools SUPPA (Trincado et al. 2018) and SPADA (Climente-González et al. 2017), leverages long-read technologies for the study of transcriptome dynamics without the need of a reference genome using our tool
RATTLE enables the improved characterization of transcriptomes operating in cells across multiple conditions and disease states directly from long-read sequencing, opening up the application of quantitative transcriptomics in cell models, non-model organisms, and individuals for which a genome reference is not available. Our method, together with the mobility of Nanopore technology, will facilitate the systematic implementation of long-read transcriptomics in clinical and field work.