Despite the advancement in sequencing technologies, around 98% of the genome is usually disregarded due to the lack of interpretation methods. Here, I compare different sequence-based deep-learning approaches for predicting the functionality of the non-coding genome. Using the largest non-coding variant database, I tested the change in prediction as pathogenic vs. benign variants were introduced. Then, I benchmarked their performance on different genomic regions and phenotypes and built a logistic regression model for cell- and phenotype-specific track selection. The models outperformed state-of-the-art evolutionary- and variant based methods. Finally, I compared different target-gene annotation databases using ontology-based Resnik’s semantic similarity. I combined the previous steps in a variant-to-phenotype or phenotype-to-variant workflow and applied it to rare variants.
Hatoon Ali is a Master's student in the Bio-Ontology Research Group (BORG) at King Abdullah University of Science and Technology (KAUST)under the supervision of Prof. Robert Hoehndorf. Ali's research focuses on studying genetic links to human disease, with a particular interest in utilizing sequence-based prediction models to prioritize non-coding variants. She is also working on customizing bioinformatics tools for special populations.
Prior to joining KAUST, Ali earned a Bachelor's degree in Clinical Laboratory Science from King Saud University in 2018.