Computing solutions for biological problems

Xin Gao (left) often collaborates with structural biologist Stefan Arold. Their most recent project led to a computational pipeline that can help pharmaceutical companies discover new protein targets for existing, approved drugs.
© 2018 KAUST

Producing research outputs that have computational novelty and contributions, as well as biological importance and impacts, is a key motivator for computer scientist Xin Gao. His Group at KAUST has experienced a recent explosion in their publications. Since January 1, 2018, they have produced 27 papers, including 11 published in the top three computational biology journals and seven presented at the top artificial intelligence and bioinformatics conferences.

Originally from China, Gao joined KAUST in 2010 after a stint with the University of Waterloo in Canada and a prestigious fellowship at Carnegie Mellon University in U.S.  His group collaborates closely with experimental scientists to develop novel computational methods to solve key open problems in biology and medicine, he explains.  “We work on building computational models, developing machine-learning techniques, and designing efficient and effective algorithms. Our focus ranges from analyzing protein amino acid sequences to determining their 3D structures to annotating their functions and understanding and controlling their behaviors in complex biological networks,” he says.

Gao describes one third of his lab’s research as methodology driven, where the group develops theories and designs algorithms and machine-learning techniques. The other two-thirds is driven by problems and data.  One example of his methodology-driven research is work on improving non-negative matrix factorization (NMF), a dimension-reduction and data-representation tool formed of a group of algorithms that decompose a complex dataset expressed in the form of a matrix.  

NMF is used to analyze samples where there are many features that might not all be important for the purpose of study. It breaks down the data to display patterns that can indicate importance. Gao’s team improved on NMF by developing max-min distance NMF (MMDNMF), which runs through a very large amount of data to be able to highlight the high-order features that describe a sample more efficiently. 

To demonstrate their approach, Gao’s team applied the technique to human faces, using the images of 11 people with different expressions. Each image was treated as a sample with 1,024 features. After training MMDNMF to derive data to represent the features of each face, it could more correctly assign any black-and-white facial image than could be done using traditional NMF.

Opening biology’s Pandora’s box

Gao has many successful collaborations with KAUST researchers, but he says one of the most successful is with structural biologist, Stefan Arold.

Together, they have worked on several projects, including one2 that has led to a computational pipeline that can help pharmaceutical companies discover new protein targets for existing, approved drugs. 

“Drug repositioning is commercially and scientifically valuable,” explains Gao. “It can reduce the time needed for drug development from twenty to 6 years, and the costs from around 2 billion USD to 300 million USD. The National Institutes of Health in the United States estimates that 70 percent of drugs on the market can potentially be repositioned for use in other diseases.” 

Read the full article