Whole exome sequencing (WES) has been highly successful for identifying genetic variants driving heritable disease and is being integrated with clinical services to guide diagnosis and treatment. While WES is an optimal technology for quickly processing the key targets of common diseases that follow simple models of inheritance, it is limited in utility for diagnosing complex diseases that may be influenced by regulatory or intergenic regions. Clinical whole genome sequencing (WGS) offers high-depth characterisation of the whole suite of SNP and small indels contributing to complex or rare diseases and the potential to identify causal variants that are overlooked by generalised panels.
We critically evaluated the GATK best-practice workflow for clinical WGS using publically available clinical standards processed on orthologous technologies. We assessed the accuracy of variant calling, reproducibility, data quality, data yield and processing time. We determined the diagnostic potential in known disease cases with a range of variant types and discuss the computational burden in comparison to WES.
WGS was able to accurately identify curated variants in the clinical standards and identified a further 63,022 additional variants within the Clinical Research Exome v2 capture regions. This equated to an additional 58% information gain. We identified 5.4 million intergenic variants and 5.1 million non-coding variants, consistent with published estimates. The current release of GATK was limited for large indels and deletions and is not designed for CNV and structural variants. Data integrity and quality checking was fast and easily automated to fit into processing pipelines. Whole genome approaches will be increasingly important for clinical developments aimed at personalising treatments but the increase in data will burden the current curation system and pose additional financial challenges until it can be readily adopted.