High-throughput sequencing has proven to be an effective tool in the diagnosis of many genetic disorders. Predicting the functional consequences of identified variants however remains a significant challenge, with current approaches of limited use in distinguishing pathogenic from benign. Missense variants are particularly difficult to predict due to the vast range of effects these can have. Conservation-based approaches to predicting missense variant consequences are widely used, however these rely on the suitability and depth of aligned sequences, and may not identify regions of biological importance specific only to humans.
Using gnomAD[1], the largest database of human standing variation, we have created a sequence-based measure of intolerance to missense variation across over 18,000 unique human genes named the Missense Tolerance Ratio (MTR)[2]. We have shown that patient-ascertained variants preferentially cluster in intolerant, low scoring MTR regions.
We have also observed that more intolerant regions cluster within protein tertiary structures, which we propose can be used to identify important residues and interaction sites that other methods may not have detected as functionally relevant. By combining the MTR estimates with protein tertiary structures, we aim to create a novel and more sensitive measure of intolerance to missense variation for both clincial use as well as in the discovery of novel functionally important features. We are creating an interactive webserver through which the MTR can be viewed over a protein's tertiary structure to be made freely available.