Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds by Othman Soufan

Event Start
Event End
Lecture Hall 2, Building 9


Drug discovery is a process that takes many years and hundreds of millions of dollars to reveal a con dent conclusion about a specific treatment. Part of this sophisticated process is based on preliminary investigations to suggest a set of chemical compounds as candidate drugs for the treatment. Computational resources have been playing a significant role in this part through a step known as virtual screening. From a data mining perspective, the availability of rich data resources is key in training prediction models. Yet, the difficulties imposed by big expansion in data and its dimensionality are inevitable. In this thesis, I address the main challenges that come when data mining techniques are used for virtual screening. In order to achieve an efficient virtual screening using data mining, I start by addressing the problem of feature selection and provide analysis of the best ways to describe a chemical compound for an enhanced screening performance. High-throughput screening (HTS) assays data used for virtual screening are characterized by a great class imbalance. To handle this problem of class imbalance, I suggest using a novel algorithm called DRAMOTE to narrow down promising candidate chemicals aimed at interaction with specific molecular targets before they are experimentally evaluated. Existing works are mostly proposed for small-scale virtual screening based on making use of few thousands of interactions. Thus, I propose enabling large-scale (or big) virtual screening through learning millions of interactions while exploiting any relevant dependency for better accuracy. A novel solution called DRABAL that incorporates structure learning of a Bayesian Network as a step to the model dependency between the HTS assays, is showed to achieve significant improvements over existing state-of-the-art approaches.

Brief Biography

Othman Soufan is a Ph.D. Candidate at King Abdullah University of Science and Technology (KAUST). He received his B.Sc. degree in Management Information Systems from King Fahd University of Petroleum and Minerals (KFUPM), in 2010 with first class distinction. He then joined King Abdullah University of Science and Technology (KAUST) and received his M.Sc. degree in Computer Science in 2012. His research interests include developing machine learning and data mining techniques to address challenging problems in computational biology and biomedical applications. His research work has resulted in several peer-reviewed publications in high-quality journals and one filed patent. Othman has received a postdoctoral fellowship award from McGill University in Canada where he will pursue his interest in data mining and bioinformatics.

More Information:

For more info contact: Ph.D. Candidate Othman Soufan; email:

Contact Person