Poster Presentation 40th Annual Lorne Genome Conference 2019

Reference-free reconstruction and error correction of transcriptomes from Nanopore long-read sequencing (#146)

Ivan de la Rubia 1 , Joel Indi 1 2 , Eduardo Eyras 1 3
  1. Pompeu Fabra University, Barcelona, Spain
  2. Instituto de Medicina Molecular, Universidade de Lisboa , Lisbon, Portugal
  3. Catalan Institution of Research and Advanced Studies, Barcelona, Spain

Disease states generally present specific RNA transcripts that do not exist in normal cells. These transcript isoforms may not exist in the reference annotation, and short-read sequencing data may not recover them accurately either due to the limitation of the read length or to the lack of an appropriate genome reference. Long-read sequencing technologies offer the potential to obtain the actual transcriptome operating in cells. However, accurate analysis of long-reads remains challenging due to error rates and the fact that transcript splicing variants differ from each other by short sequence stretches; hence, new analysis methods are needed.

We have developed RATTLE, a new method for the reference-free reconstruction of transcriptomes from Nanopore sequencing reads. RATTLE uses a new k-mer based similarity measure to cluster reads and quantify transcripts, performs error correction, and delineates alternative transcript isoforms, from long reads without the need of a genome reference. Using experimental and simulated data, we show that RATTLE outperforms other methods at clustering, and it achieves a detection of known splice-sites similar to reference-based methods. Additionally, read correction with RATTLE improves the proportion of reads mapped to the genome, and recovers accurately gene and transcript abundances. Direct quantification of transcriptomes coupled to our tools SUPPA (Trincado et al. 2018) and SPADA (Climente-González et al. 2017), leverages long-read technologies for the study of transcriptome dynamics without the need of a reference genome using our tool 

RATTLE enables the improved characterization of transcriptomes operating in cells across multiple conditions and disease states directly from long-read sequencing, opening up the application of quantitative transcriptomics in cell models, non-model organisms, and individuals for which a genome reference is not available. Our method, together with the mobility of Nanopore technology, will facilitate the systematic implementation of long-read transcriptomics in clinical and field work.

 

  1. Trincado JL, Entizne JC, Hysenaj G, Singh B, Skalic M, Elliott DJ, Eyras E. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 2018 Mar 23;19(1):40.
  2. Climente-González H, Porta-Pardo E, Godzik A, Eyras E. The Functional Impact of Alternative Splicing in Cancer. Cell Rep. 2017 Aug 29;20(9):2215-2226.