Drug Repositioning through the Development of Diverse Computational Methods using Machine Learning, Deep Learning, and Graph Mining
In this dissertation, we combined artificial intelligence and machine/deep learning with chemical and biological properties to develop several computational methods to solve biomedical domain problems, specifically drug repositioning, and demonstrated their efficiencies and capabilities. We developed three network-based DTI prediction methods using machine learning, graph embedding, and graph mining. These methods significantly improved prediction performance, and the best-performing method even reduces the error rate by more than 33% across all datasets compared to the best state-of-the-art method. As it is more insightful to predict continuous values that indicate how tightly the drug binds to a specific target, we conducted a comparison study of current regression-based methods that predict drug-target binding affinities (DTBA). Our methods demonstrated their efficiency and capability by achieving high prediction performance and identifying therapeutic targets for several cancer types. We further conducted a lung cancer case study of findings that support the novel predicted targets.
Overview
Abstract
The rapidly increasing number of existing drugs with genomic, biomedical, and pharmacological data make computational analyses possible, which reduces the search space for drugs and facilitates drug repurposing or repositioning (DR) to unveil new targets for existing drugs. Thus, artificial intelligence, machine learning (including deep learning), and data mining play a critical role in DR and have been used to identify biological interactions such as drug-target interactions (DTI), drug-drug interactions, drug-response, and drug-disease associations. The prediction of these biological interactions is seen as a critical phase needed to make drug development more sustainable. Furthermore, late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. In this dissertation, we tried to address the key limitations associated with drug development and repositioning and seek to answer the main research question: Could we develop computational methods that contribute to different DR stages and significantly reduce the prediction error at each of these stages? To act on this question, this dissertation addresses three crucial problems in the DR pipeline and presents several novel computational methods developed for DR that significantly improve prediction performance by leveraging data and technique integration.
First, we developed three network-based DTI prediction methods using machine learning, graph embedding, and graph mining. These methods significantly improved prediction performance, and the best-performing method even reduces the error rate by more than 33% across all datasets compared to the best state-of-the-art method. Second, because it is more insightful to predict continuous values that indicate how tightly the drug binds to a specific target, we conducted a comparison study of current regression-based methods that predict drug-target binding affinities (DTBA). We discussed how to develop more robust DTBA methods and subsequently developed Affinity2Vec, the first regression-based method that formulates the entire task as a graph-based method and combines several computational techniques from feature representation learning, graph mining, and machine/deep learning with no 3D structural data of target proteins. Affinity2Vec reduces the mean square error and outperforms the state-of-the-art methods. Finally, since oncology-related drug development failure is associated with sub-optimal target identification, we developed the first DL-based computational method (OncoRTT) to identify cancer-specific therapeutic targets for the ten most common cancers worldwide. Implementing our approach required creating a suitable dataset that could be used by the computational method to identify oncology-related DTIs. Thus, we created the OncologyTT datasets, to build and evaluate our OncoRTT method. Our methods demonstrated their efficiency and capability by achieving high prediction performance and identifying therapeutic targets for several cancer types. We further conducted a lung cancer case study of findings that support the novel predicted targets.
Overall, in this dissertation, we combined artificial intelligence and machine/deep learning with chemical and biological properties to develop several computational methods to solve biomedical domain problems, specifically drug repositioning, and demonstrated their efficiencies and capabilities.
Brief Biography
Maha A. Thafar is a Computer Science Ph.D. candidate and researcher under the supervision of Professor Xin Gao. She is a member of the Computational Bioscience Research Center (CBRC) and CMSE Division at King Abdullah University of Science and Technology (KAUST), Saudi Arabia. She obtained her MSc degree in CS from Kent State University, Kent, Ohio USA, in 2015. Prior to her graduate studies, she worked as a teacher assistant in the Computer Science Department, Taif University, Saudi Arabia, for several years. Before that, she got her Bachelor's degree in CS from King Abdulaziz University University, Jeddah, Saudi Arabia. She works in the intersection area between computer science and biomedicine. Her current research interests include developing novel computational methods using artificial intelligence, machine/deep learning, and data mining for solving computational problems in the biomedical and healthcare domains, specifically in computational drug repositioning.