Big Data Analyses in Evolutionary Biology

Event Start
Event End
Engineering Science Hall (building.9), Level 2, Lecture Hall 2

Computational Bioscience Research Center at King Abdullah University of Science and Technology is pleased to announce the KAUST Research Conference on Big Data Analyses in Evolutionary Biology

Recent advancements in evolutionary biology are greatly impacted by the production of massive "omics" data sets, such as genomes, transcriptomes, proteomes and metabolomics that are now available for evolutionary studies. At the same time, the size and complexity of this data requires the Big Data paradigm and approach essential in its analysis. Moreover, comparative studies are crucial for evolutionary studies, which further leads to the need to develop state-of-the-art bioinformatics technologies and can underpin both research and technology.

CBRC is uniquely positioned to organize this Conference because of our own commitments in research in Big Data analyses in life science that is complemented with sophisticated tool development and database construction to support such research.  In these domains at CBRC, we focus on the following: (a) how the Big Data is handled in the studies of evolutionary biology, and (b) what kind of bioinformatics tools are needed for these studies. These foci will be intensively discussed for various living systems of organisms, as guided and selected by the conference Program Committee.

In addition, a number of biology-inspired computational algorithms have emerged over the past several decades in areas of artificial intelligence, modeling, optimization and simulation. Examples of these would be genetic algorithms, ant colony optimization algorithms, swarm theory, artificial neural networks, etc. At this conference, we want to discuss not only what computer science can bring to the field of biology in general and evolutionary biology in particular, but also what can the lessons learned from these fields offer to application and research in computational sciences.

We strongly believe that the proposed Conference will provide a platform for active exchange of opinions to advance research and developments in big data analyses and bioinformatics focused on evolutionary biology.

On behalf of the Organizing and Program Committees, we would like to welcome all of the participants. As always, we look forward to a productive exchange of ideas and hope that you will enjoy your stay at KAUST and Saudi Arabia.

Vladimir Bajic, CBRC Director and Conference Co-Chair
Takashi Gojobori, Associate CBRC Director and Conference Co-Chair​​


Day 1
  • 8:00 am: Coffee & Breakfast
  • 8:30 am: Registration (Building 9)
  • 9:00 am: Welcome address (Vladimir Bajic, Director Computational Bioscience Research Center, KAUST)
  • 9:05 am: Opening remarks (Mootaz Elnozahy, Dean Computer, Electrical and Mathematical Science & Engineering Division, KAUST)

Session 1: Computational Approaches and Bioinformatics in Evolutionary Biology (Chair: Robert Hoehndorf)

Session 2: Functional Evolution and Population Studies (Chair: Takashi Gojobori)

Day 2
  • 8:30 am: Coffee & Breakfast (Building 9 - Lobby)

Session 3: Environmental Adaptation in Evolutionary Biology and Ecology (Chair: Stefan Arold)

Session 4: Biologically-inspired Computational Methods (Chair: Vladimir Bajic)

Day 3
  • 8:45 am: Coffee & Breakfast (Building 9 - Lobby)

Session 5: Spotlight on Young Talent (Chair: Xin Gao)

  • 10:30 am: Coffe break (Building 9 - Lobby)

Session 6: Industrial and Medical Applications (Chair: John Archer)


Talk Details

KEYNOTE LECTURE: The Impact of Admixture between Modern and Archaic Humans

The genomes of archaic and early modern humans offer a unique window into their histories. However, the sequencing and analysis of DNA from archaic humans is complicated by DNA degradation, chemical modifications and contamination. Recent technological advances have made it possible to recover nuclear DNA sequences from a number of archaic and early modern humans and a number of important insights have been obtained from the whole genome sequences that have been generated. Comparison of archaic genome sequences to the genome sequences of present-day humans has allowed us to identify sequence differences that have come to fixation or reached high frequency in modern humans since their divergence from Neandertals and Denisovans, some of which may have important functional effects in modern humans. Further, ancient genomes have provided direct evidence that interbreeding between archaic humans and early modern humans occurred and that it resulted in between 1-6% archaic human DNA in the genomes of present-day non-Africans. This introgressed DNA has been shown to have both positive and negative outcomes for present-day carriers: underlying apparently adaptive phenotypes as well as influencing disease risk. I will discuss recent work in which we have identified Neandertal haplotypes that are likely of archaic origin and determined the likely functional consequences of these haplotypes using public genome, gene expression, and phenotype datasets.


Janet Kelso has a broad interest in quantitative approaches to understanding genome evolution and the molecular basis of phenotypic variation and disease susceptibility. Her group has worked extensively on the analysis of ancient genomes and have a special interest in the development of novel software for processing and analysis of high-throughput sequence data from ancient samples and the use of computational approaches to gain insights into genome evolution.

Janet received her Ph.D. in bioinformatics from the South African National Bioinformatics Institute at the University of the Western Cape under the supervision of Professor Winston Hide developing an ontology for classifying gene expression data for which she won the L'Oreal women in science fellowship.  She is the author of more than 80 peer-reviewed scientific publications. Together with Alfonso Valencia, Janet is the co-Editor-in-chief of the journal Bioinformatics, and also an editor of the journal Database. She is an active member of the Board of the International Society of Computational Biology 


HOCOMOCO: a Collection of Transcription Factor Binding Models for Human and Mouse Based on ChIP-seq Data and its Application in Genetics and Systems Biology

HOCOMOCO (HOmo sapiens COmprehensive MOdel Collection) is a human-curated collection of position weight matrix (PWM) models for binding sites for 680 human and 453 mouse TFs. HOCOMOCO is mostly based on the ChIP-seq data, that appears to be most informative on the specificities of TF binding in vivo. We used five thousand of ChIP-Seq experiments as the raw data, the experimental datasets were taken from the GTRD database where there were uniformly processed within the BioUML framework using several ChIP-Seq peak calling tools. ChIPMunk software was used for systematic motif discovery from different peak sets. Motifs that displayed the best separation of the test (ChIP-seq peaks) and control datasets were selected. To reduce the number of irrelevant motifs emerged due to indirect binding we performed extensive computer assessment and human curation of the motifs found. As valid models, we selected those that were (i) similar to the already known motifs, (ii) consistent within a TF family, or, at least, (iii) with a clearly exhibited consensus (based on LOGO representation, manually assessed). The current version of HOCOMOCO (v.11) includes 1,302 mononucleotide and 576 dinucleotide PWMs, which describe primary binding motifs of each TF and reliable alternative binding specificities. An interactive interface and bulk downloads are available at and HOCOMOCO database can be used for exploration of different problems in genetics, medicine, and systems biology, and can support studies on evolution of TF binding sites.

KAUST CEMSE CBRC Vsevolod Makeev

Vsevolod Makeev is currently the Head of the Department of Computational Systems Biology in Vavilov Institute of General Genetics, Moscow, Russia

​Vsevolod Makeev received his diploma in Physics (1990) followed by Ph.D. in Physics and Mathematics (1996) and Dr. Sci. in Physics and Mathematics (2010) from Moscow State University, Moscow, Russia. From 2001 to 2010 V.M. worked as a head of Laboratory of bioinformatics in State Research Centre of Genetics and Selection of Industrial Microorganisms, Moscow, Russia. 

Vsevolod is a specialist in algorithm development for molecular biology and sequence data analysis. With his collaborators, Vsevolod participated in developing of motif finders such as SeSiMCMC Gibbs sampler and ChIPMunk. His lab is home of databases of DNA motifs binding regulatory proteins HOCOMOCO (for human and mouse) and iDMMPMM (for Drosophila). In 2016 Vsevolod has been elected as a correspondent member to Russian Academy of Sciences. 


Plant Translational Research in Big Data Era

By 2050 the world's population will reach 9.1 billion, and food demand is expected to increase by 70% (FAO, 2009). To answer this “Nine billion question,” it is imperative to improve the interdisciplinary nature of crop production. In this seminar, we will discuss the use of Big Data from high-throughput phenotyping and genomics to advance our knowledge on the genetic mechanisms underlying stress adaptation in crop plants.
We performed our experiment at The Plant Accelerator®, a high-throughput phenotyping platform (HTP) that provides non-destructive quantitative measurements for plant development over time. We used HTP to phenotype rice, one of the most important cereals, under salt stress conditions. The use of HTP enabled us to obtain multiple measurements throughout time, namely plant growth and architecture, as well as water loss under control and stress conditions. The wealth of data resulting from daily measurements is a major bottleneck for further analysis. To overcome this challenge we are currently exploring new methods to analyze the collected data and to perform genetic analysis to identify genetic signatures in response to abiotic stress.


Sónia Negrão is Research Scientist in the Biological, Environmental Sciences & Engineering Division at King Abdullah University of Science and Technology (KAUST), Saudi Arabia.

Dr. Sónia Negrão obtained her MSc. in "Agronomy- Crop Breeding" in 2002 at Instituto Superior de Agronomia in Portugal. She obtained her Ph.D. degree at Instituto Tecnologia Química e Biológica- Universidade Nova de Lisboa (ITQB-NOVA) in Portugal, in 2008. Dr. Negrão's Ph.D. was in rice breeding using marker-assisted selection and undertaken in collaboration with the International Rice Research Institute, Philippines. During her time as a post-doctoral fellow, at ITQB-NOVA, she characterized rice genetic variation in response to salinity. In 2013, Sónia became a research scientist at The Salt Lab in King Abdullah University of Science and Technology, Saudi Arabia. Since then, she has been involved in several projects related to high-throughput phenotyping, genomics and association mapping, as well as Big Data. She has an extensive list of international collaborators such as the International Rice Research Institute, AfricaRice, University of Adelaide, CIRAD, etc. She has the capacity to attract significant funding and to coordinate several national and international projects related to genetic variation, genomics and salinity tolerance. Dr. Negrão uses genomics to accelerate breeding and address problems related to climate change and is focused on high-quality science that will ultimately lead to sustainable crop improvement. ​


The Draft Genomes of Three Wild Tomato Species: Progress & Insights

The Solanum section Lycopersicon is an economically important clade that consists of 14 species including the cultivated tomato Solanum lycopersicum, which is one of the most economically important horticultural crops. Here, we sequenced the genomes of three wild tomato species: S. pimpinellifolium, S. galapagense and S. cheesmaniae.

KAUST CEMSE CBRC Salim Bougouffa

​Salim Bougouffa received his doctorate in Biotechnology in 2010 from the University of Manchester in the UK. He joined the laboratory of Professor Pei-Yuan Qian in the Hong Kong University of Science and Technology where he worked in the field of microbial ecology using NGS technology to study microbial communities from different environments. In 2013, Salim joined the lab of Professor Vladimir Bajic where he continues to work on similar themed research. More recently, Salim also works on plant genomes such that of wild tomatoes to establish the genomic basis for their stress-tolerance traits.


KEYNOTE LECTURE: From Phylogeny to Phylomedicine

Nature has been the greatest experimenter on Earth for millennia. New mutations continuously arise in our genomes and their fate is determined by the action of purifying selection, genetic drift, and positive selection. Comparative sequence analysis at individual, population, and species levels yields a record of their outcomes in form of patterns of conservation and divergence of genomes. These evolutionary patterns and their underlying causes are now the foundation of many approaches to forecast adaptive and disruptive mutations found in our personal and somatic genomes. Predictive evolutionary techniques and the associated fundamental research investigations are encompassed by Phylomedicine, which is becoming a key discipline at the intersection of molecular evolution, genomics, and biomedicine. I will present highlights of our recent research in phylomedicine of Mendelian, cancer, and complex diseases.



Sudhir Kumar is currently the Laura H. Carnell Professor and the Director of the Institute for Genomics and Evolutionary Medicine at Temple University

Sudhir Kumar has been an early leader in exploring the theoretical and empirical intersection of evolutionary biology with computational biology, and forging accessible tools that allow researchers from diverse backgrounds to harness the analytical power of modern computational biology. With a background in Biological Sciences and Electrical & Electronics Engineering from Birla Institute of Technology and Sciences, he completed a Ph.D. and postdoctoral work in Genetics at Pennsylvania State University. During this period, he worked to develop the first version of Molecular Evolutionary Genetics Analysis (MEGA), a freely-accessible software package that has been maintained and improved over more than 20 years since its release. The enduring popularity of MEGA results from Kumar's responsiveness to community needs and dedication to accessibility and scientific rigor. He has made numerous contributions to the mathematical theory of phylogenetics through advances in estimating evolutionary distances, inference of divergence times, and algorithms for constructing phylogenetic trees. Kumar and his laboratory continue to work actively on improving phylogenetic theory and applications to the growing field of phylomedicine, which explores disease via phylogenetic methods and makes predictions informed by evolutionary biology.

His work has been cited more than 120,000 times. One of his scientific articles was included in the Thomson Reuters Web of Science top-100 most-cited papers of all time and designated the top article of the decade by The Scopus database of peer-reviewed literature. He has published numerous citation classics and hot papers. He received an Innovation Award in Functional Genomics from the Burroughs-Wellcome Fund in 2000 and is a fellow of American Association for the Advancement of Science. 


DeepGO: Predicting Protein Functions from Sequence and Interactions Using a Deep Ontology-aware Classifier

A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem.
We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.


Maxat Kulmanov is a PhD student at the Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Saudi Arabia.​

Maxat Kulmanov is a Ph.D. student at KAUST, CBRC, Bio-Ontology Research Group (BORG) under supervision of professor Robert Hoehndorf. His interests are bioinformatics, knowledge representation and reasoning, machine learning, neural networks, semantic web and algorithms. He is also interested in knowledge discovery and data integration using artificial intelligence and semantic web technologies in biology and biomedicine.


Integrative Drug Discovery Targeting Protein-Protein Interactions: The 2P2I Approach

Drug discovery is an inherently inefficient process, particularly in oncology. The difficulty in matching the immense and complex chemical world with a desired physiological effect is illustrated by limitations such as harmful side effects and drug resistance, which defy the most powerful chemotherapeutics available. Novel therapeutic targets and new ways to identify, to characterize and to develop anti-cancer drugs are needed. Inhibitors of protein-protein interaction represents an alternative and almost unexplored reservoir for drug development in oncology. In this context, our objectives are to identify, to understand, to validate and to target protein-protein interaction interfaces critically involved in tumor cell signaling, with the specific purpose of facilitating the transfer of therapeutic and pharmacological targets into preclinical and clinical development programs in oncology.
We have recently implemented an integrated drug discovery approach with an automated process combining chemoinformatics (development of a software dedicated to in silico chemical reaction and systematic energetic –docking- evaluation of the resulting chemical libraries), chemical synthesis (diversity oriented synthesis and hit explosion / Chemspeed SLTII platform) and HTS assays (pharmacological and biophysical evaluation / ECHO platform). This integrative effort will be exemplified with examples of hit(s) discovery and hit-to-lead optimization on Bromodomain or PDZ domain inhibitors development as well as a drug repurposing success of tyrosine kinase inhibitors (imatinib and masitinib).


Xavier Morelli is a ‘Director of Research’ from the Centre National de la Recherche Scientifique (CNRS, France), group leader at the Cancer Research Center of Marseille (CRCM) and scientific director of IPCDD, a platform dedicated to Drug Discovery at the Hospital “Institut Paoli-Calmettes”

​The main project of Xavier Morelli's team is focused on the identification, understanding and targeting of protein-protein interaction in cancer signaling using structural and chemical biology technologies. He has published >50 publications and patented 4 applications related to the inhibition of protein-protein interactions. He is an actual member of the scientific committee of the 'Canceropole PACA', board member of the "GDR Chembioscreen" a national organization for chemical biology screening in France and has acted as secretary of the French Society of Chemoinformatics (SFCi). He has also acted as a scientific board member of the Foundation ARC (national agency against cancer), associate editor of the journal BMC 'Pharmacology & Toxicology' and as editorial board member of several scientific journals. He is a regular consultant for pharmaceutical companies and a member of the Ph.D. program committee of the School of Chemistry at the Aix-Marseille University.​


A Synthetic Biology Approach to Waddington Landscape and Cell Fate Determination

The process of cell fate determination has been depicted intuitively as cells traveling and resting on a rugged landscape, which has been probed by various theoretical studies. However, few studies have experimentally demonstrated how underlying gene regulatory networks shape the landscape and hence orchestrate cellular decision-making in the presence of both signal and noise. Here we tested different topologies and verified a synthetic gene circuit with mutual inhibition and auto-activations to be quadrastable, which enables the direct study of quadruple cell fate determination on an engineered landscape. We show that cells indeed gravitate towards local minima and signal inductions dictate cell fates through modulating the shape of the multistable landscape. Experiments, guided by model predictions, reveal that sequential inductions generate distinct cell fates by changing the landscape in sequence and hence navigating cells to different final states. This work provides a synthetic biology framework to approach cell fate determination and suggests a landscape-based explanation of fixed induction sequences for targeted differentiation.


Xiao Wang is an Associate Professor of Biomedical Engineering at Arizona State University, USA

​Prof. Xiao Wang received his Ph.D. degree from the University of North Carolina a Chapel Hill in 2006. As the Principal Investigator of the Systems and Synthetic Biology Research Group, he is interested in using both forward (synthetic biology) and reverse (systems biology) engineering approaches to understand biology. Specific research topics include engineering synthetic multistable gene networks, systems biology research on small network motifs with feedbacks, understanding the role of noise in cell differentiation and development, and analyzing molecular evolution.



KEYNOTE LECTURE: Adaptive Evolution of Vertebrate Vision

Vertebrate ancestors appeared in a uniform, shallow water environment, but modern species flourish in highly variable niches. A striking array of phenotypes exhibited by contemporary animals is assumed to have evolved by accumulating a series of selectively advantageous mutations, but the experimental test of such adaptive events has been remarkably difficult. Genetically engineering 11 ancestral rhodopsins, which regulate dim-light vision, we have shown that early ancestral rhodopsins absorbed light maximally (max) at 500 nm, from which contemporary rhodopsins with variable maxs of 480–525 nm evolved on at least 18 separate occasions. These highly environment-specific adaptations have occurred largely by amino acid replacements at 12 sites, and most of those at the remaining 191 (~94%) sites have undergone neutral evolution. The comparison between these results and those inferred by commonly-used statistical methods demonstrates that statistical tests of positive selection can be misleading without experimental support and that the molecular basis of spectral tuning in rhodopsins should be elucidated by mutagenesis analyses using ancestral pigments. In deep-sea environments, three genera of dragonfishes are unique by emitting bioluminescence with peaks at 465-485 and 500-710 nm. These fishes can discriminate small wavelength differences between 470-580 nm using rhodopsins (or RH11 pigments) and porphyropsins (RH12), which use vitamin A1 and vitamin A2, respectively, and they also discriminate wavelengths within 430-470 and 580-720 nm using “rhodopsin-like” RH21 and RH22 pigments and long wavelength-sensitive LWS1 and LWS2 pigments, respectively. Researchers have argued that dragonfishes have invented the new color vision system to visualize their new environment generated by bioluminescence, but the data show that dragonfishes have recreated the light environments of shallow water in the deep-sea by inventing blue, green, and far-red bioluminescence.


Shozo Yokoyama moved to Emory’s Department of Biology in 2003 as an endowed chair, Asa G. Candler Professor of Biology

Shozo Yokoyama earned his B.S. and M.S. degrees in Biology in Japan, and a Ph.D. degree in biomathematics in 1977 from the University of Washington. He moved to Emory's Department of Biology in 2003 as an endowed chair, Asa G. Candler Professor of Biology.

Previously, Dr. Yokoyama was at Washington University (1978-1987, Assist Professor), the University of Illinois (1987-1991, Assoc. Professor), and Syracuse University (1991-2003). His research focuses on molecular genetics and the evolution of dim-light and color vision. Since 1990, his lab has been conducting genetic analyses of the visual pigments of a diverse range of organisms. His lab was the first to identify the amino acids that regulate red-green color vision and UV vision in various vertebrate species. These and other analyses on the origin and evolution of color vision produced "the deepest body of knowledge linking differences in specific genes to differences in ecology and to the evolution of species (The Making of The Fittest by Sean Carroll, 2006)."

Dr. Yokoyama is currently trying to understand the molecular genetics and evolution of vertebrate and invertebrate vision using quantum chemistry. He is author and co-author of over 150 articles. He was a Panel Member of the Genetics Study Section at National Institutes of Health (1988-1991) and was a President of the American Genetics Association (2003). 


The Metaorganism Imperative - We Are not Alone

Recent years have brought a changing imperative in life sciences sparked by the revolution of genomic tools to study the molecular composition and functional organization of organisms. The development of next-generation sequencing changed our understanding of microbial diversity associated with organisms and environments. There is now a multitude of studies that support the notion that a host-specific microbiome associates with multicellular organisms and provides functions related to metabolism, immunity, and environmental adaptation, among others. Consequently, interactions and communication mechanisms of members in this metaorganism presumably play a major role in maintaining host health, microbiome stability, and resilience to environmental disturbance. This presentation will highlight and discuss recent efforts to investigate metaorganism function and evolution, and how the appreciation of host-microbe interactions provides new insight to host biology in light of the microbiome.

KAUST CEMSE CBRC Christian R Voolstra

Dr. Voolstra is Associate Director of the Red Sea Research Center at King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Christian R Voolstra is a biologist and his research area is environmental genomics with a focus on acclimation and adaptation of marine invertebrates. In particular, Dr. Voolstra studies coral metaorganism function combining ecological, environmental, microbial, and molecular approaches. Corals are metaorganisms composed of the coral host, intracellular photosynthetic dinoflagellate symbionts, and associated microbiota. Together these so-called coral holobionts form the keystone species of reef ecosystems. His most recent research has particularly advanced knowledge of how the bacterial microbiome contributes to coral animal host acclimation and adaptation. Dr. Voolstra has published over 100 peer-reviewed research papers, various book chapters, and holds patents related to bioactive lead structures from marine organisms. Dr. Voolstra is a Scientific Coordinator of the TARA Pacific consortium, a steering committee member of the Global Invertebrate Genomics Alliance (GIGA), and a KAUST representative of the Reef Future Genomics (ReFuGe) 2020 consortium. Dr. Voolstra received his Ph.D. at the Institute for Genetics in Cologne, Germany in 2006 and was a Postdoctoral scholar at the University of California, Merced from 2007-2009. He was appointed Assistant Professor of Marine Science at KAUST’s Red Sea Research Center in 2009, and in 2015 was promoted to Associate Professor. In 2016, Dr. Voolstra became appointed Associate Director of the Red Sea Research Center at KAUST.​


From Fish to Fisherman: Evolutionary Genomics of Fish and Humans in the Amazon Basin

Evolutionary genomics has been used to better understand the evolution of species and to help the development of sustainable practices of exploration of our environments.
One of such environments is the Amazon basin, which covers roughly 40% of the South American continent. In my talk, I will discuss two projects in which evolutionary genomics approaches have been used to better understand the evolutionary dynamics of fish and humans in the region. First, I will describe the genome of the pirarucu fish (Arapaima gigas), a large and widespread fish in the Amazon. Second, I will discuss the sequencing and analysis of dozens of exomes from native Amazonian people".


Sandro J. de Souza is Professor of Bioinformatics at the Bioinformatics Multidisciplinary Environment (BioME) at UFRN in Natal

​Sandro J. de Souza is a biologist who has been a pioneer in genomics and bioinformatics in Brazil. He is currently Professor of Bioinformatics at the Bioinformatics Multidisciplinary Environment (BioME) at UFRN in Natal, Brazil.
Awards include: i) Pew Latin American Fellow at Harvard University from 1995 to 1998, ii) elected a  "Young Global Leader" by the World Economic Forum in 2009 and iii) Tinker Visiting Professor at University of Chicago in 2011.


Genomics-led Development of Cyanobacterial Cell Factories for Industrial Applications in Saudi Arabia

The climate, environment and industrial infrastructure of the Kingdom of Saudi Arabia (KSA) make it an ideal location for marine photosynthetic microbial cell factory (PMCF) technologies. However, to access this potential we must first develop PCMF strains capable of thriving in the harsh KSA climate. Generally, photosynthetic microbes are limited by their tolerance of light, temperature and salinity. Taken together, photosynthetic microbes from the Red Sea offer a very promising source for potential PMCF strains from which robust, highly efficient photosynthetic microbes adapted to the relevant conditions (salinity, temperature, insolation) can be developed. Here, we describe an isolation-characterization and genomic modeling pipeline targeting highly robust marine photosynthetic microbes from which PMCFs are developed by directed evolution approaches. To date we have generated over 1200 primary isolates from which a group of 120 lead strains were developed. From the latter, 44 optimized lead strains have been produced. Selected strains have been licensed for industrial applications in KSA.



John Archer is Principal Research Scientist at the Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

​John Archer leads Data Generation, Validation and Engineering in the Computational Biosciences Research Center (CBRC). His research applies genomic and metabolomics data to the development of non-model photoautotrophic microbial systems as cell factories for industrial applications in KSA using molecular and synthetic biology approaches. Following his Ph.D. in molecular genetics from Glasgow University, and a postdoc at MIT, he joined the faculty of Cambridge University. In 2009, he joined KAUST as a founder. He has authored and co-authored over 132 academic publications, technology disclosures and patents. He has founded Kyanos Biotechnology the first biotechnology manufacturing spin-out in KSA.


KEYNOTE LECTURE: A Novel Graph-Based Constant-Column Biclustering Method for Mining Growth Phenotype Data

Growth phenotype profiling of genome-wide gene-deletion strains over stress conditions can offer a clear picture that the essentiality of genes depends on environmental conditions. Systematically identifying groups of genes from such high-throughput data that share similar patterns of conditional essentiality and dispensability under various environmental conditions can elucidate how genetic interactions of the growth phenotype are regulated in response to the environment. In this talk, I will first demonstrate that detecting such “co-fit'' gene groups can be cast as a less well-studied problem in biclustering, i.e., constant-column biclustering. Despite significant advances in biclustering techniques, very few were designed for mining in growth phenotype data. I will then propose Gracob, a novel, efficient graph-based method that casts and solves the constant-column biclustering problem as a maximal clique finding problem in a multipartite graph. We compared Gracob with a large collection of widely used biclustering methods that cover different types of algorithms designed to detect different types of biclusters. Gracob showed superior performance on finding co-fit genes over all the existing methods on both a variety of synthetic data sets with a wide range of settings, and three real growth phenotype data sets for E. coli, proteobacteria, and yeast.


Dr. Xin Gao is an associate professor of computer science in the Computer, Electrical and Mathematical Sciences and Engineering Division at King Abdullah University of Science and Technology (KAUST), Saudi Arabia.

​Dr. Xin Gao is also a PI in the Computational Bioscience Research Center at KAUST and an adjunct faculty member at David R. Cheriton School of Computer Science at University of Waterloo, Canada. 

Prior to joining KAUST, he was a Lane Fellow at Lane Center for Computational Biology in School of Computer Science at Carnegie Mellon University, U.S.. He earned his bachelor degree in Computer Science in 2004 from Computer Science and Technology Department at Tsinghua University, China, and his Ph.D. degree in Computer Science in 2009 from David R. Cheriton School of Computer Science at University of Waterloo, Canada. 

Dr. Gao’s research interests are building computational models, developing machine learning techniques, and designing efficient and effective algorithms, with particular focus on applications to key open problems in structural biology, systems biology and synthetic biology. He has co-authored more than 120 research articles in the fields of bioinformatics, machine learning, and statistics. 


A Brief Review of Computational Algorithms Inspired by Nature

Optimization problems from different fields are generally difficult to solve due to the large solution space. These problems are present in our daily life and range from determining the best delivery plan for a distribution company to finding genes associated with specific diseases, where an exhaustive search would take longer than the universe has existed. Although there are several optimization algorithms, different meta-heuristics inspired by nature have been proposed to achieve improved results. These nature-based algorithms attempt to mimic the behavior of ant colonies, bees, bats, wolves, and even black holes, among others and have produced remarkably solutions for different problems. In this talk, I will present a brief overview of these nature-based algorithms and some successful applications.

KAUST CEMSE CBRC Arturo Magana Mora

Arturo Magana-Mora is a postdoctoral research fellow at AIST, Japan

​Arturo Magana-Mora obtained his Ph.D. in Computer Science at the King Abdullah University of Science and Technology (KAUST) under the supervision of Prof. Vladimir Bajic in 2017 and later joined the National Institute of Advanced Industrial Science and Technology (AIST) in Japan as postdoctoral research fellow in the Com. Bio Big-Data Open Innovation Lab (CBBD-OIL). His research interests include the development of novel machine learning and data mining techniques to address the complex problems in biology.  His research work has resulted in several peer-reviewed publications in high-quality journals and currently serves as a referee for several scientific journals.


Genetic Algorithms and their Application in Machine Learning

Genetic Algorithms (GAs) were invented based on the evolutionary ideas of natural selection and genetics, especially following the principles first laid down by Charles Darwin of "survival of the fittest". They were designed as adaptive heuristic search algorithm used to solve optimization problems. This talk gives an introduction of the basic concepts and terminology involved in Genetic Algorithms. Then GAs in Machine Learning will be discussed, as well as some application examples.

KAUST CEMSE CBRC Xiangliang Zhang

Dr. Xiangliang Zhang is an Associate Professor of Computer Science at King Abdullah University of Science and Technology (KAUST), Saudi Arabia

​Dr. Xiangliang Zhang is an Associate Professor of Computer Science at KAUST, Saudi Arabia. She earned her Ph.D. degree in computer science from INRIA-Universite Paris-Sud, France, in July 2010. Zhang and the MINE group ( she leads focus on learning from complex and large-scale streaming data.​


Biologically-Inspired Optimization Algorithms for View Selection in OLAP

On-Line Analytical Processing (OLAP) systems provide efficient low level database support for a variety of data analysis, machine learning and knowledge extraction tasks that involve very large datasets. This talk will: (i) introduce the main concepts of OLAP, in particular the multi-dimensional data model that is relevant to machine learning, where dimensions correspond to ?features?; (ii) explain how the precomputation of various projections of the data can accelerate significantly the analysis of data; (iii) present the view selection problem that decides the set of projections to precompute in order to optimize performance under the constraints of storage and data maintenance time; and (iv) discuss how biologically inspired algorithms such as ant colony optimimization and Genetic algorithms, have been employed to solve the view selection problem.


Panos Kalnis is Professor and Chair of the Computer Science program, King Abdullah University of Science and Technology (KAUST).

In 2009 he was visiting assistant professor in the CS Dept., Stanford University. Before that, he was assistant professor in the CS Dept., National University of Singapore (NUS). In the past he was involved in the designing and testing of VLSI chips and worked in several companies on database designing, e-commerce projects and web applications. He has served as associate editor for the IEEE Transactions on Knowledge and Data Engineering (TKDE) from 2013 to 2015, and on the editorial board of the VLDB Journal from 2013 to 2017. Currently, he is on the editorial board of the Data Science and Engineering Journal. He received his Diploma from the Computer Engineering and Informatics Dept., Univ. of Patras, Greece in 1998 and his Ph.D. from the Computer Science Dept., Hong Kong Univ. of Science and Technology (HKUST) in 2002. His research interests include Big Data, Cloud Computing, Parallel and Distributed Systems, Large Graphs and Long Sequences.


Characterization of Red Sea Cyanobacteria for Cell Factory Application in Saudi Arabia


Yi Mei Ng is a Ph.D. Student at the Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

Research Interest: Marine microbiology, molecular biology, algal technology.


Functional Interrogation of Malaria metabolism reveals Species and Stage-specific Differences in Nutrient Essentiality and Drug Targeting


Alyaa Mohamed is a Ph.D. Student at the Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

​Research Interest: Genome-scale metabolic modeling, Gene expression analysis, Constraint-based modeling.


Genome-scale Evaluation of the Biotechnological Potential of Red Sea Derived Bacillus Strains


Ghofran Othoum is a Ph.D. Student at the Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

​Research Interest: Metabolic Reconstruction, Metabolic Modeling, Pathway Optimization.


DEEPre: Sequence-based Enzyme EC Number Prediction by Deep Learning


Yu Li is a Ph.D. Student at the Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

​Research Interest: deep learning and bioinformatics.


Ontology Design Patterns for Combining Pathology and Anatomy: Application to Study Aging and Longevity in Inbred Mouse Strains


​Research interest: Artificial Intelligence, Bioinformatics


KEYNOTE LECTURE: On Unnatural Selection: Lessons Learned from Large Scale Medical Genomics in Saudi Arabia

Natural selection is the axiom of evolutionary biology. Humans, however, can manipulate natural selection in ways that sometimes contradict it. One example is consanguineous mating, a practice that is both common and ancient in Saudi Arabia despite its clearly negative effect on reproductive fitness. The perceived societal benefits of consanguineous marriages are not sufficient to compensate for the reduced reproductive fitness and there must be compensatory biological factors perhaps the most significant of which is high fertility (a common co-variable). This long term “tug of war” has left many imprints on the Saudi genome with important medical implications. These include the conspicuous lack of the “purging effect” and the consequences of that on carrier frequency interpretation and overall Mendelian disease burden in the society. Furthermore, the characteristic signature of autozygosity presents numerous opportunities for the annotation of the human genome. In my talk, I will discuss these points in detail based on our experience with large scale medical genomics in Saudi Arabia.

KAUST CEMSE CBRC Fowzan Alkuraya

Fowzan Alkuraya is Professor of Human Genetics at Alfaisal University and Senior Consultant and Principal Clinical Scientist at King Faisal Specialist Hospital and Research Center

Fowzan Alkurayan graduated with first-class honor and was the valedictorian of his class at the College of Medicine, King Saud University, Riyadh, Saudi Arabia.  He did his pediatric residency at Georgetown University Hospital, followed by a fellowship in clinical genetics and another in molecular genetics at Harvard Medical School.  He also did a postdoctoral research fellowship in the area of developmental genetics in the lab of Prof. Richard Maas at Harvard Medical School.  He returned to his native Saudi Arabia to establish the Developmental Genetics Lab, and later the Mendelian Genomics Program at KFSHRC.  He is an authority in the area of Mendelian genetics with more than 320 published manuscripts that describe his lab's discovery of hundreds of novel disease genes in humans.  He is a frequently invited speaker at local, regional and international conferences, on the editorial board of prominent human genetics journals, and the recipient of numerous prestigious awards including William King Bowes Award in Medical Genetics and King Salman Award for Disability Research.​


Methylation as a Tool for Monitoring Colon Cancer Diversity: Technical Challenges and Possibilities

It is well known that cancer populations evolve rapidly and can undergo a wide diversification. Being able to track the evolution of a cancer cell population through informative markers has enormous potential in cancer medicine.
Methylation is deeply involved in several cancers, and methylation patterns are more and more used in combination with mutational and clinical information as a tool in personalized medicine.
Using the example of a study in colon cancer this talk will illustrate the state of the art in measuring methylation levels and how that information can be used to provide accurate predictions of a wide variety of cancer subtypes.

KAUST CEMSE CBRC Roberto Incitti

Roberto is Bioinformatician at the Computational Bioscience Research Center, King Abdullah University of Science and Technology (KAUST), Saudi Arabia

​Roberto Incitti is Senior Bioinformatician at KAUST's CBRC, with a research interest in cancer biomarkers and is a founding member of OncoDiag, a biotech start-up company producing a test for early detection of bladder cancer.


A Portable System for Rapid Bacterial Identification Using a Single-molecule DNA Sequencer and its Application to the Diagnosis of Infectious Diseases

Bacterial infection is still a serious threat to humans. Because of the great diversity of disease-causing bacteria, there is a limitation in the ability of current bacterial identification tests using bacterial culture or antibodies. DNA sequencing is more suited to identify bacterial species correctly, but the time and cost for sequencing was a problem. We thus developed a genome analyzing system for the rapid diagnosis of infectious diseases. Using a nanopore-based, single-molecule DNA sequencer, MinION, and two high-spec laptop computers, we could assemble a system that can in principle identify bacterial species in about one hour from a DNA sample of bacterial infection.
One of the key technologies here was our original database that comprehensively collected genome sequence data. The database, called GenomeSync, is one of the largest collection of complete genome sequences, containing genomes of 26,223 bacterial species (as of October 2017). Another essential technology was a software package, called GSTK, for quick and accurate identification of bacterial species for each sequencing read. This can run sequence similarity searches against the GenomeSync database on a multi-core computer, and can summarize the outputs of similarity searches based on the biological taxonomy. Furthermore, in order to get the sequencing results as quickly as possible, we installed a software for extracting sequence data from the sequencer outputs while the sequencer is running, which enabled us to analyze sequence data in a real-time manner.
Owing to the successful combination of these core technologies, we could realize the rapid and accurate analysis of bacterial composition. We hope that our system will be widely utilized for the diagnosis of infectious diseases in the future.

KAUST CEMSE CBRC Tadashi Imanishi

Tadashi Imanishi is Professor at Tokai University School of Medicine and Visiting Professor at Hokkaido University, Graduate School of Information Science and Technology

  • ​1988: Graduated from the University of Tokyo, School of Science (awarded Bachelor of Science)
  • 1993: Graduated from the University of Tokyo, Graduate School of Science (awarded Doctor of Science)
  • 1993-1994: Postdoctoral Fellow (PD) of Japan Society for the Promotion of Science
  • 1994-2001: Assistant Professor at National Institute of Genetics
  • 1998-1999: University of Washington (Seattle, USA), Visiting Scholar
  • 2001-2012: Team Leader at National Institute of Advanced Industrial Science and Technology
  • 2004-present: Visiting Professor at Hokkaido University, Graduate School of Information Science and Technology
  • 2012-present: Professor at Tokai University School of Medicine


KEYNOTE LECTURE: Insights to Cardiovascular Disease Risk in Large Data Sets

UK Biobank was established to improve understanding of the causes of common diseases including CAD and recruited 502,713 (94% of self-reported European ancestry) individuals aged 40-69 between 2005 and 2010. In addition to self-reported disease outcomes as well as extensive health and life-style questionnaire data, participants are being tracked through their NHS records and national registries (including cause of deaths, hospitalisations and primary care records).
Coronary artery disease (CAD) is the commonest cause of death in the world. Large scale genetic studies in UK Biobank and other large cohorts have identified so far 100 robustly associated risk loci leading to a better understanding of disease biology. We can now start exploring the common genetic risk between CAD and the broader spectrum of cardiovascular diseases including stroke, heart failure, pulmonary artery disease, and atrial fibrillation as well as traditional risk factors.
In parallel, the Genomics England Clinical Interpretation Cardiovascular Domain is leveraging large scale whole-genome sequencing data from the 100,000 Genomes project to improve our understanding of the genetic basis of rare inherited cardiovascular disorders.



Panos Deloukas is Chair of Cardiovascular Genomics, Director of the Centre for Genomic Health and Dean for Life Sciences at Queen Mary University of London

Panos Deloukas is a world leader in complex trait genetics focused upon elucidating the genetic basis of cardio-metabolic traits. He leads or co-leads many global consortia in common disease genetics (CARDIoGRAMplusC4D, GLGC, GIANT) having identified many of the loci underpinning lipid levels and coronary disease.

At the Wellcome Trust Sanger Institute (1994-2013) he made major contributions to the Human Genome Project and the International HapMap project. He was a founder member of the Wellcome Trust Case-Control Consortium. He has authored over 430 publications (H-index 130) and has been amongst the world's top 1% of most highly cited researchers in Molecular Biology & Genetics continuously since 2002.


Student Poster Competition


KAUST CEMSE CBRC Big Data Analyses Student Poster Competition

From L-R: Vasiliki Kordopati (Audience Choice Award), Alyaa Mohamed (1st Prize), Nicholas Gagnon (3rd prize). Not in picture: Amani Al Ma'abadi (2nd Prize)




  • Where is the conference located?

    KAUST Research Conference on Big Data Analyses in Evolutionary Biology will take place in King Abdullah University of Science and Technology (KAUST), in the Engineering Science Hall (Building 9) Level 2, Lecture Hall 2.

    I would like to present at the conference. How can I do that?

    Presenting at the conference is by invitation only, except for a limited number of Ph.D. students whose poster session applications are accepted.

    Will I get a conference T-shirt?

    Yes! First come, first served until we run out.


  • I would like to attend the conference. How do I register?

    Due to Saudi visa requirements, we can accept out-of-Kindgom attendees only in highly exceptional cases. In-Kingdom attendees are highly encouraged to register. However, they will have to cover their own costs and travel arrangements.

    How can I register for the conference?

    You can register or unregister:
    (1) Filling in the form on our website under the tab “Registration” or
    (2) by sending an email to or
    (3) in person, before a session starts.

Poster competition

  • I'm in KAUST. How do I join the student poster competition?

    Deadline for internal submissions is on November 23, 2017. This is open to all KAUST students with research relevant to the conference title (Big Data Analyses in Evolutionary Biology). Submit your poster online filling in the form on our website under the tab "Poster Submission".

    What is the guideline for the student poster competition?

    - Poster should be sent in PDF format, portrait layout and A0 size (841mm x 1189mm or 33.11" x 46.81").
    - Authors are expected to be by their posters during the poster session to answer questions from judges and attendees.
    - Participants are responsible for removing and collecting their poster after the conference. Unclaimed posters will be discarded.


  • Is there any business center around the conference venue?

    Yes, KAUST Library is open 24 hours and offers workstations and a quiet space for work and study. Wi-Fi is also accessible everywhere on the KAUST campus.

    What should I wear at KAUST?

    December at KAUST will be warm and likely pleasantly breezy with a slight possibility of rain. Smart casual attire is required at the conference. Women do not have to wear an abaya but should be dressed in conservative clothing (neck, arms and legs covered with loosely fitted clothing).

    What are the recreation options available at KAUST?

    Many recreation options are available at KAUST. To find out more, visit this link:



Visit our Flickr album for photos of the conference.​

Related Persons