Poster Presentation 40th Annual Lorne Genome Conference 2019

Deep Convolutional Neural Networks for Nanopore Signal Processing (#145)

Tansel Ersavas 1 , James Ferguson 1 , Huanle Liu 1 , Oguzhan Begik 1 , Morghan Lucas 1 , Lilly Bojarski 1 , Kirston Barton 1 , Eva Novoa Pardo 1 , Martin A Smith 1
  1. Garvan Institute of Medical Research, Darlinghurst, NSW, Australia

Third generation sequencing technologies are set to disrupt genomics research. In particular, Oxford Nanopore Technologies (ONT) offers a highly portable single-molecule real-time sequencing platform with applications for all spheres of genomics. The raw signal data generated by ONT is typically converted to nucleotide sequences for subsequent analysis, which can introduce errors given the stochastic nature of the signal. This complicates certain bioinformatic analyses, such as demultiplexing barcodes in single cell sequencing data, where base calling errors in small (<20nt) adapters drastically reduce demultiplexing yields. For instance, only 18% of nanopore reads are accurately demultiplexed using the RAGE-seq protocol (citation). Can this yield be improved by directly querying raw nanopore signal data? To answer this question, barcoded and multiplexed 4 unique sequences on a MinION sequencing run. After seqeuncing, the signal corresponding to the barcodes was extracted and used to train a neural network classifier. Here, we present the results of this strategy, which yields over 80% barcode recovery with 98%  accuracy.  Accurate demultiplexing of raw nanopore data opens up new possibilities and applications where traditional base called data can not help. We expect that this method will scale to thousands of barcodes, facilitating the analysis of single cell, single molecule sequencing data.