# Bioinformatics Software and Knowledgebases

### Context

The focus of our research is modeling the key processes in the living cell so as to be able to explore the way cell reacts to different challenges, for example, being able to explore and accurately simulate the effects of chemicals on different cellular processes, as well as model of synthetic biology constructs, optimize metabolic pathways, and reduce the search space for suitable pathways. The solutions in this domain are generic and could find applications in single cell, as well as complex multicellular organisms.
From a perspective of computational science, development of computational methods and their application in biology are key areas for generating high impact results and discoveries. This is primarily driven by the recent developments in systems approaches to biological research using a variety of high throughput techniques (genomics, transcriptomics, proteomics and metabolomics) that provide insights into whole-genome responses and large quantity of data, but do not provide direct answers to important biological questions. The output of these activities has several distinctive characteristics such as:
• Size: the resulting experimental data is often enormous in size of the order of gigabytes to terabytes.
• Complexity: available data points to networks of extremely complex interactions between entities encapsulated within data.
• Dependency: available information is typically strongly context-dependent, further frustrating analysis of biologically relevant relationships.
Direct application of traditional computational methodologies is not able to adequately address the three challenges mentioned above due to their computational intractability. For that reason, a considerable part of the CBRCs activity is related to addressing these problems. These remedies are largely contaimed in two complementary approaches:
• Development and application of methods resulting in novel algorithms that can address computational intractability by providing acceptable solutions.
• Development of novel and modification of the existing algorithms in a way that makes it possible for efficient execution on HPC systems. This involves, among other, code optimization and parallelization that allows for properly harnessing the unique advantages offered by KAUSTs leading edge HPC and other computational resources within an academic environment.
The ability to analyze large and complex data sets facilitates the Center`s research in a broad scope of biological problems. These include, for example, modeling biological networks; constructing in silico genetic and metabolic cellular models; modeling and metabolic engineering of MCFs. Some of the major topics explored are given below, although this program encompases most of the other research activities in CBRC.

### NGS and omics data analysis

This project focuses on development of methods and platforms for analysis and integration of massive data generated through the high-throughput (HT) experimental technologies aiming at inferring useful knowledge out of the information contained in the data. These are important for various aspects of cellular behaviour and phenotype characterisation (e.g. integration and applications in promoter profiling, enhancer prediction and profiling, genomic biomarker identification, chromatin modifications, gene methylation, transcription regulation, ...). CBRC has solid experience in analysis and integration of the following types of data:

• ChIP-seq data (binding sites of various transcription factors, histone modification markers, other nuclear DNA binding proteins and protein complexes, histone acetyltransferases, methyltransferases).

• Footprints based on DNase I hypersensitivity sites (DHS) or Formaldehyde assisted isolation assays (FAIRE).

• Transcriptomics data based on CAGE tags (cap analysis of gene expression), RNA-seq, microarrays.

• Ribo-seq (ribosome profiling).

Some parts of this project contribute to the CBRC CCF Program.​

### Microbial genomics and metagenomics

This project focuses on the genomic studies of individual microbial strains and metagenomic samples, mainly but not exclusively from the Red Sea (e.g. water columns, brine pools, lagoons, etc.). CBRC develops methods and tools for efficient functional annotation for such data and develops systems for targeted search for industrially and pharmacologically interesting enzymes. Great part of the research is collaborative with the Red Sea Research Center at KAUST. This project is part of the CBRC CCF Program. Some of the resources developed are:

• INDIGO

• DMAP

• BEACON

• DEOP