Poster Presentation 40th Annual Lorne Genome Conference 2019

A synthetic reference unit for measuring quantitative features of genome biology (#240)

Andre Reis 1 2 , Bindu Swapna Kanakamedala 2 , Christopher Barker 2 , Ira Deveson 1 2 , Timothy Mercer 1 2 3
  1. St Vincent's Clinical School, Faculty of Medicine, UNSW, Sydney, New South Wales, Australia
  2. Garvan Institute of Medical Research, Darlinghurst, NEW SOUTH WALES, Australia
  3. Altius Institute for Biomedical Sciences, Seattle, Washington, United States

Frequently, the desired outcome of next-generation sequencing experiments is to identify DNA or RNA sequences that differ in abundance between samples. However, technical biases can confound the accurate quantification of such sequences, potentially obscuring or inflating true biological variation. To address this issue, we have developed synthetic DNA spike-in standards (‘DNA ladder’) that provide an accurate internal quantitative scale against which to measure DNA abundance and its associated variation both within and between sequencing datasets. Each individual DNA ladder is comprised of four unique sequence elements (600 bp) present at 1, 2, 4 and 8 copies. This design, in which elements are linked within a single molecule, rather than mixed individually, ensures copy-number is encoded with absolute accuracy and technical biases affect the ladder homogeneously. Since DNA ladders are entirely artificial and bear minimal homology to natural sequences, they can be safely added to DNA samples before library preparation and undergo concurrent sequencing. Here, we describe an alignment-free strategy that uses our synthetic ladder as an internal reference to accurately compare the abundance of DNA sequences between samples. By measuring count variability of k-mers in the DNA ladder, for which the true copy-number is known, it is possible to normalise counts of endogenous sequences between libraries and identify those with observed differences that cannot be explained by technical variation, implying true differences in abundance between samples. We have validated this concept and proven its efficacy in a range of experiments, such as human whole-genome sequencing and metagenome shot-gun sequencing.