Computational Methods for ChIP-seq Data Analysis and Applications Haitham M. Ashoor, Ph.D., Computer Science Apr 10, 16:00 - 17:30 B3 L5 5209 computation techniques machine learning bioinformatics data analysis Abstract The development of Chromatin immunoprecipitation followed by sequencing (ChIP-seq) technology has enabled the construction of genome-wide maps of protein-DNA interaction. Such maps provide information about transcriptional regulation at the epigenetic level (histone modifications and histone variants) and at the level of transcription factor (TF) activity. This dissertation presents novel computational methods for ChIP-seq data analysis and applications. The work of this dissertation addresses four main challenges. First, I address the problem of detecting histone modifications from
Genetic Algorithms for Optimization of Machine-learning Models and their Applications in Bioinformatics Arturo Magana Mora, Ph.D., Computer Science Apr 10, 13:00 - 15:00 B3 L5 R5209 machine learning data mining biology genetics bioinformatics Abstract Machine-learning (ML) techniques have been widely applied to solve different problems in biology. However, biological data are large and complex, which often results in extremely intricate ML models. Frequently, these models may have poor performance or may be computationally unfeasible. This study presents a set of novel computational methods and focuses on the application of genetic algorithms (GAs) for the simplification and optimization of ML models and their applications to biological problems. The dissertation addresses the following three challenges. The first challenge is
Novel Computational Methods that Facilitate Development of Cyanofactories for Free Fatty Acid Production by Olaa Motwalli Olaa A. Motwalli, Ph.D., Computer Science Apr 9, 16:00 - 17:00 B3 L5 R5209 machine learning bioinformatics graph mining genomics Abstract Finding a source from which high-energy-density biofuels can be derived at an industrial scale has become an urgent challenge for renewable energy production. Some microorganisms can produce free fatty acids (FFA) as precursors towards such high-energy-density biofuels. In particular, photosynthetic cyanobacteria are capable of directly converting carbon dioxide into FFA. However, current engineered strains need several rounds of engineering to reach the level of FFA production for it to be commercially viable. Thus, new chassis strains that require less engineering are needed
Novel Data Mining Methods for Virtual Screening of Biological Active Chemical Compounds by Othman Soufan Othman Soufan, Ph.D., Computer Science Nov 16, 14:00 - 15:00 H2 B9 machine learning data mining Computational biology biomedical applications Chemical compounds visualization Abstract Drug discovery is a process that takes many years and hundreds of millions of dollars to reveal a con dent conclusion about a specific treatment. Part of this sophisticated process is based on preliminary investigations to suggest a set of chemical compounds as candidate drugs for the treatment. Computational resources have been playing a significant role in this part through a step known as virtual screening. From a data mining perspective, the availability of rich data resources is key in training prediction models. Yet, the difficulties imposed by big expansion in data and its
Welcome at Kaust UQ School On Numerical Methods For Direct And Inverse Problems 2016 22-28 May Raul Tempone, Professor, Applied Mathematics and Computational Sciences May 22, 12:00 - May 28, 12:00 B9 H1 R2322 numerical methods stochastic differential equations statistics KAUST UQ SCHOOL 2016 is an annual base thematic conference at King Abdullah University of Science and Technology held by Raul Tempone, Professor of Applied Mathematics and Computational Sciences at CEMSE (Computer, Electrical and Mathematical Sciences & Engineering Division). Tempone’s interests in the mathematical foundation of computational science and engineering are reflected in this summer school. The school’s goal is to provide participants with an overview on the most recent research progress in the field of uncertainty quantification, with emphasis to • Multi-Level and Multi-Index
Workshop on Statistical Process Monitoring and Risk Assessment for Engineering and Spatial Environmental Applications Mar 13, 09:00 - Mar 15, 15:00 B1 L4 R4102 Environmental Statistics spatial statistics LIST OF SPEAKERS KAUST Environmental Statistics Group, CEMSE Division - Ying Sun (ying.sun@kaust.edu.sa), PI - Fouzi Harrou (fouzi.harrou@kaust.edu.sa), Postdoc - Huang Huang (huang.huang@kaust.edu.sa), PhD Student - Tianbo Chen (tianbo.chen@kaust.edu.sa), PhD Student - Rui Meng (rui.meng@kaust.edu.sa), Master Student - Sulaiman Binkhamis (sulaiman.binkhamis@kaust.edu.sa), Master Student Hydrology and Land Observation Group, BESE Division - Gaohong Yin (gaohong.yin@kaust.edu.sa), Master Student Collaborators - NorEddine Ghaffour (noreddine.ghaffour@kaust.edu.sa), co-I, WDRC - Matthew McCabe
A Distributed Implementation of the Multi-resolution Approximation for Very Large Spatial Data Dorit Hammerling, National Center for Atmospheric Research (NCAR) Feb 10, 15:30 - 16:30 B1 L4 R4102 spatial statistics With data of rapidly increasing sizes in the environmental and geosciences such as satellite observations and high-resolution climate model runs, the spatial statistics community has recently focused on methods that are applicable to very large data. One such state-of-the-art method is the multi-resolution approximation (MRA), which was specifically developed with high performance computer architecture in mind.
Disease Risk Estimation by Combining Case-Control Data with Aggregated Information on the Population at Risk Xiaohui Chang, Assistant Professor, College of Business at Oregon State University Nov 9, 15:30 - 16:00 B1 L4 R4102 statistics We propose a novel statistical framework by supplementing case–control data with summary statistics on the population at risk for a subset of risk factors. Our approach is to first form two unbiased estimating equations, one based on the case–control data and the other on both the case data and the summary statistics, and then optimally combine them to derive another estimating equation to be used for the estimation.
Kriging Asymptotics William Kleiber, Assistant Professor, University of Colorado Nov 9, 15:00 - 16:30 B1 L4 R4102 spatial statistics Spatial analyses often focus on spatial smoothing using the geostatistical technique known as kriging. Theoretical results regarding large sample convergence rates of kriging predictors remain elusive. By casting kriging as a variational problem, we develop an equivalent kernel approximation technique that can also lead to computational feasibility for large data problems.
Workshop on Computational Space-Time Statistics Oct 4, 09:45 - Oct 6, 10:45 B1 L4 R4102 Statistics of extremes Environmental Statistics Workshop on Computational Space-Time Statistics
Collective Estimation of Multiple Bivariate Density Functions with Application to Angular-sampling-based Protein Structure Prediction Mehdi Moodaaliat, Assistant Professor, Marquette University Mar 10, 15:00 - 16:00 B1 statistics In this talk we develop a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs. Each log density function in the collection is modeled as a linear combination of a common set of basis functions. The shared basis functions are modeled as bivariate splines on triangulations and are estimated using data. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles.
Bayesian Regression Trees, Nonparametric Heteroscedastic Regression Modeling and MCMC Sampling Matthew Pratola, Assistant Professor of Statistics, The Ohio State University Nov 24, 15:00 - 16:00 B1 L2 nonparametric statistics Bayesian Statistics In this talk, we introduce a new Bayesian regression tree model that allows for possible heteroscedasticity in the variance model and devise novel MCMC samplers that appear to adequately explore the posterior tree space of this model.
Uncertainty Quantification of Tsunami Models Serge Guillas, Professor of Statistics, University College London (UCL) Sep 8, 15:00 - 16:00 B1 uncertainty quantification Environmental Statistics In this talk, we first show various strategies for the efficient emulation of simulators having uncertain inputs, with applications to tsunami wave modelling. A fast surrogate of the simulator's time series of outputs is provided by the outer product emulator.
Parametric Problems, Stochastic, and Identification By Prof. Hermann Matthies (ISCTUB, Germany) Prof. Hermann Matthies, Institute of Scientific Computing TU Braunschweig, Geramany Mar 6, 15:00 - 16:00 B1 R4102 Parameter identification problems are formulated in a probabilistic language, where the randomness reflects the uncertainty about the knowledge of the true values. This setting allows conceptually easily incorporating new information, e. g. through a measurement, by connecting it to Bayes's theorem. The unknown quantity is modelled as a (may be high-dimensional) random variable. Such a description has two constituents, the measurable function and the measure.
Scalable Hierarchical Algorithms for eXtreme Computing Workshop David Keyes, Senior Associate to the President, King Abdullah University of Science and Technology Apr 28, 08:00 - Apr 30, 16:00 KAUST scientific computing The 2012 SHAX-C workshop focuses international expert attention on the prospects for the three great hierarchical algorithms of scientific computing: multigrid, fast transforms, and fast multipole methods. These methods are kernels in simulations based on formulations of partial differential equations, integral equations, and interacting particles – in short, they are scientific and engineering workhorses.