Sequencing has identified millions of somatic mutations in human cancers. Identifying and distinguishing cancer driver genes amongst the millions of candidate mutations remains a major challenge. Accurate identification of driver genes and driver mutations is critical for advancing cancer research and personalizing treatment based on accurate stratification of patients. Due to inter-tumor genetic heterogeneity, many driver mutations within a gene occur at low frequencies, which make it challenging to distinguish them from other non-driver mutations. Motivated by these challenges, we developed a novel method for identifying cancer driver genes. Our approach utilizes multiple complementary types of information, specifically cellular phenotypes, cellular locations, function, and whole body physiological phenotypes as features. We demonstrate that our method can accurately identify known cancer driver genes and distinguish between their role in different types of cancer. In addition to identifying known driver genes, we identify several novel candidate driver genes. We provide an external evaluation of the predicted genes using a dataset of 26 nasopharyngeal cancer samples that underwent whole exome sequencing. We find that the predicted driver genes have a significantly higher rate of mutation than non-driver genes, both in publicly available data and in the nasopharyngeal cancer samples we use for validation. Additionally, we characterize the sub-networks of genes that are jointly involved in specific tumors.
Sara Althubaiti is a Master Student at the Bio-Ontology Research Group (BORG) at King Abdullah University of Science and Technology. Her interests are bioinformatics, text mining, ontologies, and cancer. Her research focuses on applying machine learning methods in cancer biology and development specifically in the field of finding driver genes and mutations in cancer using genomic and transcriptomic data.