Recent advances in Next Generation Sequencing (NGS) technologies have facilitated the generation of massive amounts of genomic data which in turn is bringing the promise that personalized medicine will soon become widely available. As a result, there is an increasing pressure to develop computational tools to analyze and interpret genomic data. In this dissertation, we present a systematic approach for interrogating patients' genomes to identify candidate causal genomic variants of Mendelian and oligogenic diseases. To achieve that, we leverage the use of biomedical data available from extensive biological experiments along with machine learning techniques to build predictive models that rival the currently adopted approaches in the field. We integrate a collection of features representing molecular information about the genomic variants and information derived from biological networks. Furthermore, we incorporate genotype-phenotype relations by exploiting semantic technologies and automated reasoning inferred throughout a cross-species phenotypic ontology network obtained from human, mouse, and zebrafish studies. In our first developed method, named PhenomeNet Variant Predictor (PVP), we perform an extensive evaluation of a large set of synthetic exomes and genomes of diverse Mendelian diseases and phenotypes. Moreover, we evaluate PVP on a set of real patients' exomes suffering from congenital hypothyroidism. We show that PVP successfully outperforms state-of-the-art methods, and provides a promising tool for accurate variant prioritization for Mendelian diseases. Next, we update the PVP method using a deep neural network architecture as a backbone for learning and illustrate the enhanced performance of the new method, DeepPVP on synthetic exomes and genomes. Furthermore, we propose OligoPVP, an extension of DeepPVP that prioritize candidate oligogenic combinations in personal exomes and genomes by integrating knowledge from protein-protein interaction networks and we evaluate the performance of OligoPVP on synthetic genomes created by known disease-causing digenic combinations. Finally, we discuss some limitations and future steps for extending the applicability of our proposed methods to identify the genetic underpinning for Mendelian and oligogenic diseases.
Imene Boudellioua is a Ph.D. student at the computational bioscience research center. Her research interests involve the application of machine learning and data mining algorithms for functional annotation of various biological entities including the prediction of proteins' functions and human disease-causative genetic variation.
*Light Refreshments will be provided