Poster Presentation 40th Annual Lorne Genome Conference 2019

Using protein structure and function information to predict and understand variant pathogenicity (#112)

David B Ascher 1
  1. University of Melbourne, Parkville, VICTORIA, Australia

One of the challenges for clinical integration of genomic information is the characterisation of novel variants, with many variants of uncertain significance. This is further complicated by the multitude of effects a mutation may have. We have developed a suite of programs that uses protein structural information to calculate the molecular consequences of coding variants on protein structure and function.

 

Focussing on the InSiGHT database for hereditary colorectal cancer and related diseases, over 30% of collated variants are of uncertain significance. By mapping characterised variants onto the protein structures, we showed that likely-pathogenic variants were situated in regions with lower tolerance to missense mutation and had larger destabilising effects on protein structure and interactions. Using a Random Forest algorithm we trained a predictive model capable of accurately classifying variants between the likely-pathogenic group and the likely-benign group (ROC AUC = 0.94) and the population group (ROC AUC = 0.99).

 

This approach was applied to accurately identify the risk of VHL disease patients developing renal carcinoma, which is now used in the UK to guide patient management.  In a prospective trial (n=3620), no patients classified as low risk developed renal carcinoma; whilst 92% patients classified as high risk developed at least one. Analysis of the molecular consequences of mutations has also been used to guide patient segregation in Mendelian disease clinical trials and has been integrated into a number of clinical programs, including variant prioritisation and characterisation with the Brazilian Ministry of Health and the identification of Pyrazinamide resistance mutations with the Victorian TB Program.  

 

We have demonstrated how protein structural and functional features are highly discriminatory between, and highly predictive of, pathogenicity class. This information can provide a powerful and scalable approach to interpret genomic data, how they relate to clinical outcomes and guide future drug development.