Prioritizing Causative Genomic Variants by Integrating Molecular and Functional Annotations from Multiple Biomedical Ontologies
The dissertation focuses on developing novel computational methods to improve the diagnosis of patients with rare or complex diseases. By systematically relating human phenotypes resulting from gene function loss or change to gene functions and anatomical/cellular locations, the candidate aims to enhance the prediction and prioritization of disease-causing variants. These methods, leveraging graph-based machine learning and biomedical ontologies, demonstrate significant improvements over existing approaches. The presentation will include a systematic evaluation of the methods, demonstrating their ability to compensate for incomplete data and their applications in biomedicine and clinical decision-making. This research contributes to more effective methods for predicting disease-causing variants and advancing precision medicine, offering promising prospects for improved diagnostics and patient care.
Overview
Abstract
Whole-exome and genome sequencing are widely used to diagnose individual patients. However, despite its success, this approach leaves many patients undiagnosed. This could be due to the need to discover more disease genes and variants or because disease phenotypes are novel and arise from a combination of variants of multiple known genes related to the disease. Recent rapid increases in available genomic, biomedical, and phenotypic data enable computational analyses, reducing the search space for disease-causing genes or variants and facilitating the prediction of causal variants. Therefore, artificial intelligence, data mining, machine learning, and deep learning are essential tools that have been used to identify biological interactions, including protein-protein interactions, gene-disease predictions, and variant--disease associations. Predicting these biological associations is a critical step in diagnosing patients with rare or complex diseases.
In recent years, computational methods have emerged to improve gene-disease prioritization by incorporating phenotype information. These methods evaluate a patient's phenotype against a database of gene-phenotype associations to identify the closest match. However, inadequate knowledge of phenotypes linked with specific genes in humans and model organisms limits the effectiveness of the prediction. Information about gene product functions and anatomical locations of gene expression is accessible for many genes and can be associated with phenotypes through ontologies and machine-learning models. Incorporating this information can enhance gene-disease prioritization methods and more accurately identify potential disease-causing genes.
This dissertation aims to address key limitations in gene-disease prediction and variant prioritization by developing computational methods that systematically relate human phenotypes that arise as a consequence of the loss or change of gene function to gene functions and anatomical and cellular locations of activity. To achieve this objective, this work focuses on crucial problems in the causative variant prioritization pipeline and presents novel computational methods that significantly improve prediction performance by leveraging large background knowledge data and integrating multiple techniques.
Therefore, this dissertation presents novel approaches that utilize graph-based machine-learning techniques to leverage biomedical ontologies and linked biological data as background knowledge graphs. The methods employ representation learning with knowledge graphs and introduce generic models that address computational problems in gene-disease associations and variant prioritization. I demonstrate that my approach is capable of compensating for incomplete information in public databases and efficiently integrating with other biomedical data for similar prediction tasks. Moreover, my methods outperform other relevant approaches that rely on manually crafted features and laborious pre-processing. I systematically evaluate our methods and illustrate their potential applications for data analytics in biomedicine. Finally, I demonstrate how our prediction tools can be used in the clinic to assist geneticists in decision-making. In summary, this dissertation contributes to the development of more effective methods for predicting disease-causing variants and advancing precision medicine.
Brief Biography
Azza Althagafi is a Ph.D. candidate in Computer Science at the Computational Bioscience Research Center (CBRC), Bio-Ontology Research Group (BORG), under the supervision of Professor Robert Hoehndorf. She is also a faculty member in Computer Science at Taif University (TU), Taif, Saudi Arabia, holding a Bachelor's degree in Computer Science from Umm Al Qura University (UQU), Makkah, Saudi Arabia, and a Master's degree in Computer Science from KAUST. Throughout her academic research at KAUST, Azza has developed a keen interest in the intersection of computer science and biology, with a specific focus on leveraging artificial intelligence techniques to understand the human genome and its connection to various diseases. Her research involves the development of new machine learning and deep learning methodologies, algorithm design, and building computational models to address bioinformatics challenges. In addition to her academic pursuits, Azza actively participates in several conferences, both internationally and within the kingdom, presents tutorials, works as a teaching assistant, and has been honored with multiple awards, including the CEMSE Dean's List Award, Best Poster Award at the Middle East Genetic and Metabolic Academy (MEGMA) conference, and Poster Award at the KAUST Research Conference: AI in Medicine. Through her involvement in events, conferences, and several volunteer initiatives, Azza demonstrates her dedication to making a positive impact in her field and beyond.