Thursday, November 19, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/95474758108?pwd=WkwrdiszTE1uYTdmR3JRK09LVDErZz09
In this talk we consider the problem of estimating the score function (or gradient of the log-likelihood) associated to a class of partially observed diffusion processes, with discretely observed, fixed length, data and finite dimensional parameters. We construct an estimator that is unbiased with no time-discretization bias. Using a simple Girsanov change of measure method to represent the score function, our methodology can be used for a wide class of diffusion processes and requires only access to a time-discretization method such as Euler-Maruyama. Our approach is based upon a novel adaptation of the randomization schemes developed by Glynn and co-authors along with a new coupled Markov chain simulation scheme. The latter methodology is an original type of coupling of the coupled conditional particle filter. We prove that our estimator is unbiased and of finite variance. We then illustrate our methodology on several challenging statistical examples. This is a joint work with Jeremy Heng (ESSEC, Singapore) and Jeremie Houssineau (Warwick, UK)
Thursday, October 22, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/95474758108?pwd=WkwrdiszTE1uYTdmR3JRK09LVDErZz09
Due to the well-known computational showstopper of the exact Maximum Likelihood Estimation (MLE) for large geospatial observations, a variety of approximation methods have been proposed in the literature, which usually require tuning certain inputs. For example, the Tile Low-Rank approximation (TLR) method, a recently developed efficient technique using parallel hardware architectures, involves many tuning parameters including the numerical accuracy, which needs to be selected according to the features of the true process. To properly choose the tuning parameters, it is crucial to adopt a meaningful criterion for the assessment of the prediction efficiency with different inputs. Unfortunately, the most commonly-used mean square prediction error (MSPE) criterion cannot directly assess the loss of efficiency when the spatial covariance model is approximated. In this paper, we present two other criteria, the Mean Loss of Efficiency (MLOE) and Mean Misspecification of the Mean Square Error (MMOM), and show numerically that, in comparison with the common MSPE criterion, the MLOE and MMOM criteria are more informative, and thus more adequate to assess the loss of the prediction efficiency by using the approximated or misspecified covariance models. Thus, our suggested criteria are more useful for the determination of tuning parameters for sophisticated approximation methods of spatial model fitting. To illustrate this, we investigate the trade-off between the execution time, estimation accuracy, and prediction efficiency for the TLR method with intensive simulation studies and suggest proper settings of the TLR tuning parameters. We then apply the TLR method to a large spatial dataset of soil moisture in the area of the Mississippi River basin, showing that with our suggested tuning parameters, the TLR method is more efficient in prediction than the Gaussian predictive process method, which is a typical low-rank based approximation.
Thursday, October 15, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/95474758108?pwd=WkwrdiszTE1uYTdmR3JRK09LVDErZz09
Compartmental epidemiological models are one of the simplest models for the spread of a disease.  They are based on statistical models of interactions in large populations and can be effective in the appropriate circumstances.  Their application historically and in the present pandemic has sometimes been successful and sometimes spectacularly wrong.  In this talk I will review some of these models and their application.  I will also discuss the behavior of the corresponding dynamical systems, and discuss how the theory of optimal control can be applied to them.  I will describe some of the challenges in using such a theory to make decisions about public policy.
Thursday, October 08, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/95474758108?pwd=WkwrdiszTE1uYTdmR3JRK09LVDErZz09
Big data analytics and large-scale simulations have followed largely independent paths to the high-performance computing frontier, but important opportunities now arise that can be addressed by combining the strengths of each. As a prominent big data application, geospatial statistics is increasing performance bound. We present Exascale GeoStatistics (ExaGeoStat) software, a high-performance library implemented on a wide variety of contemporary hybrid distributed-shared supercomputers whose primary target is climate and environmental prediction applications. Such software is destined to play an important role at the intersection of big data and extreme simulation by allowing applications with prohibitively large memory footprints to be deployed at scales worthy of the data on modern architectures by exploiting recent algorithmic developments in computational linear algebra. In contrast to simulation-based on partial differential equations derived from first-principles modeling, ExaGeoStat employs a statistical model based on the evaluation of the Gaussian log-likelihood function, which operates on a large dense covariance matrix. A relatively small ensemble of expensive simulations can be used to parameterize a statistical model from which inexpensive emulations can be drawn after a parameter fitting process. For the dense covariance matrix operations of geospatial statistics to keep up with the growing scale of data sets from the sparse Jacobian operations of PDE simulations, data sparsity intrinsic in the physics must be identified and exploited. Parameterized by the Matern covariance function, the covariance matrix is symmetric and positive definite. The computational tasks involved during the evaluation of the Gaussian log-likelihood function become daunting as the number n of geographical locations grows, as O(n^2) storage and O(n^3) operations are required. While ExaGeoStat's distributed capability extends traditional ``exact'' linear algebra approaches, the library supports several approximate techniques that reduce the complexity of the maximum likelihood operation and while respecting user-specified accuracy. For example, ExaGeoStat supports the Tile Low-Rank (TLR) approximation technique which exploits the data sparsity of the dense covariance matrix by compressing the off-diagonal tiles up to a user-defined accuracy threshold. Because many environmental characteristics show a spatial continuity, i.e., data at two nearby locations are on average more similar than data at two widely spaced locations, other approximations become valid and are provided by ExaGeoStat such as diagonal-super tile and mixed-precision approximation methods, whereby the less significant correlations that comprise the vast majority of entries in the covariance matrix are stored in lower precisions than the defaults for tightly coupled degrees of freedom.
Jan Haskovec, AMCS, KAUST
Thursday, October 01, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/95474758108?pwd=WkwrdiszTE1uYTdmR3JRK09LVDErZz09
Individual-based models of collective behavior represent a very active research field with applications in physics (spontaneous magnetization), biology (flocking and swarming) and social sciences (opinion formation). They are also a hot topic engineering (swarm robotics). A particularly interesting aspect of the dynamics of multi-agent systems is the emergence of global self-organized patterns, while individuals typically interact only on short scales. In this talk I shall discuss the impact of delay on asymptotic consensus formation in Hegselmann-Krause-type models, where agents adapt their „opinions“ (in broad sense) to the ones of their close neighbors. We shall understand the two principial types/sources of delay - information propagation and processing - and explain their qualitatively different impacts on the consensus dynamics. We then discuss various mathematical methods that provide asymptotic consensus results in the respective settings: Lyapunov functional-type approach, direct estimates, convexity arguments and forward-backward estimates.
Thursday, September 17, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/95474758108?pwd=WkwrdiszTE1uYTdmR3JRK09LVDErZz09
In this work, we estimate extreme sea surface temperature (SST) hotspots, i.e., high threshold exceedance regions, for the Red Sea, a vital region of high biodiversity. We analyze high-resolution satellite-derived SST data comprising daily measurements at 16703 grid cells across the Red Sea over the period 1985–2015. We propose a semiparametric Bayesian spatial mixed-effects linear model with a flexible mean structure to capture spatially-varying trend and seasonality, while the residual spatial variability is modeled through a Dirichlet process mixture (DPM) of low-rank spatial Student-t processes (LTPs). By specifying cluster-specific parameters for each LTP mixture component, the bulk of the SST residuals influence tail inference and hotspot estimation only moderately. Our proposed model has a nonstationary mean, covariance and tail dependence, and posterior inference can be drawn efficiently through Gibbs sampling. In our application, we show that the proposed method outperforms some natural parametric and semiparametric alternatives.
Thursday, September 10, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/95474758108?pwd=WkwrdiszTE1uYTdmR3JRK09LVDErZz09
When constructing high-order schemes for solving hyperbolic conservation laws with multi-dimensional finite volume schemes, the corresponding high-order reconstructions are commonly performed in characteristic spaces to eliminate spurious oscillations as much as possible. For multi-dimensional finite volume schemes, we need to perform the characteristic decomposition several times in different normal directions of the target cell, which is very time-consuming. We propose a rotated characteristic decomposition technique that requires only one-time decomposition for multi-dimensional reconstructions. This technique not only reduces the computational cost remarkably, but also controls spurious oscillations effectively. We take a third-order weighted essentially non-oscillatory finite volume scheme for solving the Euler equations as an example to demonstrate the efficiency of the proposed technique. We apply the new methodology to the simulation of instabilities in direct initiation of gaseous detonations in free space.
Thursday, September 03, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/95474758108?pwd=WkwrdiszTE1uYTdmR3JRK09LVDErZz09
Discussing the concept of correlation and how to interpret it alone (marginally) or within a more complex environment (conditionally). This rather simple observation is the key observation behind a lot of exciting developments and connections in statistics that can be leveraged for improved computations and better motivated statistical models.
Thursday, April 30, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/706745599
Contact Person
In many problems in statistical signal processing, regularization is employed to deal with uncertainty, ill-posedness, and insufficiency of training data. It is possible to tune these regularizers optimally asymptotically, i.e. when the dimension of the problem becomes very large, by using tools from random matrix theory and Gauss Process Theory. In this talk, we demonstrate the optimal turning of regularization for three problems : i) Regularized least squares for solving ill-posed and/or uncertain linear systems, 2) Regularized least squares for signal detection in multiple antenna communication systems and 3) Regularized linear and quadratic discriminant binary classifiers.
Thursday, April 16, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/706745599
Contact Person
Transcription factors are an important family of proteins that control the transcription rate from DNAs to messenger RNAs through the binding to specific DNA sequences. Transcription factor regulation is thus fundamental to understanding not only the system-level behaviors of gene regulatory networks, but also the molecular mechanisms underpinning endogenous gene regulation. In this talk, I will introduce our efforts on developing novel optimization and deep learning methods to quantitatively understanding transcription factor regulation at network- and molecular-levels. Specifically, I will talk about how we estimate the kinetic parameters from sparse time-series readout of gene circuit models, and how we model the relationship between the transcription factor binding sites and their binding affinities.
Thursday, April 09, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/706745599
Contact Person
An important stream of research in computational design aims at digital tools which support users in realizing their design intent in a simple and intuitive way, while simultaneously taking care of key aspects of function and fabrication. Such tools are expected to shorten the product development cycle through a reduction of costly feedback loops between design, engineering and fabrication. The strong coupling between shape generation, function and fabrication is a rich source for the development of new geometric concepts, with an impact to the original applications as well as to geometric theory. This will be illustrated at hand of applications in architecture and fabrication with a mathematical focus on discrete differential geometry and geometric optimization problems.
Monday, April 06, 2020, 16:00
- 18:00
https://kaust.zoom.us/j/3520039297
Contact Person
The thesis focuses on the computation of high-dimensional multivariate normal (MVN) and multivariate Student-t (MVT) probabilities. Firstly, a generalization of the conditioning method for MVN probabilities is proposed and combined with the hierarchical matrix representation. Next, I revisit the Quasi-Monte Carlo (QMC) method and improve the state-of-the-art QMC method for MVN probabilities with block reordering, resulting in a ten-time-speed improvement. The thesis proceeds to discuss a novel matrix compression scheme using Kronecker products. This novel matrix compression method has a memory footprint smaller than the hierarchical matrices by more than one order of magnitude. A Cholesky factorization algorithm is correspondingly designed and shown to accomplish the factorization in 1 million dimensions within 600 seconds. To make the computational methods for MVN probabilities more accessible, I introduce an R package that implements the methods developed in this thesis and show that the package is currently the most scalable package for computing MVN probabilities in R. Finally, as an application, I derive the posterior properties of the probit Gaussian random field and show that the R package I introduce makes the model selection and posterior prediction feasible in high dimensions.
Thursday, April 02, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/706745599
Contact Person
This talk presents a new classification method for functional data. We consider the case where different groups of functions have similar means so that it is difficult to classify them based on only the mean function. To overcome this limitation, we propose the second moment based functional classifier (SMFC). Here, we demonstrate that the new method is sensitive to divergence in the second moment structure and thus produces lower rate of misclassification compared to other competitor methods. Our method uses the Hilbert-Schmidt norm to measure the divergence of second moment structure. One important innovation of our classification procedure lies in the dimension reduction step. The method data-adaptively discovers the basis functions that best capture the discrepancy between the second moment structures of the groups, rather than uses the functional principal component of each individual group, and good performance can be achieved as unnecessary variability is removed so that the classification accuracy is improved. Consistency properties of the classification procedure and the relevant estimators are established. Simulation study and real data analysis on phoneme and rat brain activity trajectories empirically validate the superiority of the proposed method.
Thursday, March 12, 2020, 12:00
- 13:00
https://kaust.zoom.us/j/255432702
Contact Person
Functional data analysis is a very active research area due to the overwhelming existence of functional data. In the first part of this talk, I will introduce how functional data depth is used to carry out exploratory data analysis and explain recently-developed depth techniques. In the second part, I will discuss spatio-temporal statistical modeling. It is challenging to build realistic space-time models and assess the validity of the model, especially when datasets are large. I will present a set of visualization tools we developed using functional data analysis techniques for visualizing covariance structures of univariate and multivariate spatio-temporal processes. I will illustrate the performance of the proposed methods in the exploratory data analysis of spatio-temporal data. To join the event please go to https://kaust.zoom.us/j/255432702 .
Thursday, March 05, 2020, 12:00
- 13:00
Building 9, Level 2, Room 2322
Contact Person
In the lecture we present a three dimensional mdoel for the simulation of signal processing in neurons. To handle problems of this complexity, new mathematical methods and software tools are required. In recent years, new approaches such as parallel adaptive multigrid methods and corresponding software tools have been developed allowing to treat problems of huge complexity. Part of this approach is a method to reconstruct the geometric structure of neurons from data measured by 2-photon microscopy. Being able to reconstruct neural geometries and network connectivities from measured data is the basis of understanding coding of motoric perceptions and long term plasticity which is one of the main topics of neuroscience. Other issues are compartment models and upscaling.
Sigrunn Sorbye, Associate Professor, UiT The Arctic University of Norway
Thursday, February 20, 2020, 12:00
- 13:00
Building 9, Level 2, Room 2322
Contact Person
In this talk I will discuss statistical models which incorporate temperature response to the radiative forcing components. The models can be used to estimate important climate sensitivity measures and give temperature forecasts. Bayesian inference is obtained using the methodology of integrated nested Laplace approximation and Monte Carlo simulations. The resulting approach will be demonstrated in analyzing instrumental data and Earth system model ensembles.
Thursday, February 06, 2020, 12:00
- 13:00
Building 9, Level 2, Lecture Hall 1
Contact Person
​Author of more than 290 journal and conference publications, Professor Stenchikov's research interests are in multi-scale modeling of environmental processes and numerical methods; global climate change, climate downscaling, atmospheric convection; assessment of anthropogenic impacts and geoengineering; air-sea interaction, evaluating environmental consequences of catastrophic events like volcanic eruptions, nuclear explosions, forest and urban fires; and air pollution, transport of aerosols, chemically and optically active atmospheric tracers, their radiative forcing and effect on climate.
Paula Moraga, Lecturer, Department of Mathematical Sciences, University of Bath, UK
Wednesday, February 05, 2020, 12:00
- 13:00
Building 9, Level 2, Hall 2
Contact Person
In this talk, I will give an overview of my research which focuses on the development of innovative statistical methods and interactive visualization applications for geospatial data analysis and health surveillance. I will illustrate some of my projects in the following areas: 1. Development of new statistical methodology; 2. Development of open-source statistical software such as the R packages; 3. Health surveillance projects. Finally, I will describe my future research on innovation in data acquisition and visualization, precision disease mapping, and digital health surveillance, and how it can inform policymaking and improve population health globally.
Prof. Daniele Durante, Department of Decision Sciences, Bocconi University, Italy
Wednesday, November 27, 2019, 15:30
- 16:30
B1 L4 room 4102
Contact Person

Abstract

There are several Bayesian models where the posterior density

Prof. Ben Zhao, Computer Science, University of Chicago, USA
Monday, November 25, 2019, 12:00
- 13:00
Building 9, Level 2, Hall 1, Room 2322
In this talk, I will describe two recent results on detecting and understanding backdoor attacks on deep learning systems. I will first present Neural Cleanse (IEEE S&P 2019), the first robust tool to detect a wide range of backdoors in deep learning models. We use the idea of perturbation distances between classification labels to detect when a backdoor trigger has created shortcuts to misclassification to a particular label.  Second, I will also summarize our new work on Latent Backdoors (CCS 2019), a stronger type of backdoor attack that is more difficult to detect and survives retraining in commonly used transfer learning systems. Latent backdoors are robust and stealthy, even against the latest detection tools (including neural cleanse).
Thursday, November 21, 2019, 12:00
- 13:00
Building 9, Level 2, Hall 1, Room 2322
I will present an overview of our activities around estimation problems for partial and fractional differential equations. I will present the methods and the algorithms we develop for the state, source and parameters estimation and illustrate the results with some simulations and real applications.
Monday, November 18, 2019, 00:00
- 23:45
Auditorium 0215, between building 2 and 3
Contact Person
2019 Statistics and Data Science Workshop confirmed speakers include Prof. Alexander Aue, University of California Davis, USA, Prof. Francois Bachoc, University Toulouse 3, France, Prof. Rosa M. Crujeiras Casais, University of Santiago de Compostela, Spain, Prof. Emanuele Giorgi, Lancaster University, UK, Prof. Jeremy Heng, ESSEC Asia-Pacific, Singapore, Prof. Birgir Hrafnkelsson, University of Iceland, Iceland, Prof. Ajay Jasra, KAUST, Saudi Arabia, Prof. Emtiyaz Khan, RIKEN Center for Advanced Intelligence Project, Japan, Prof. Robert Krafty, University of Pittsburgh, USA, Prof. Guido Kuersteiner, University of Maryland, USA, Prof. Paula Moraga, University of Bath, UK, Prof. Tadeusz Patzek, KAUST, Saudi Arabia, Prof. Brian Reich, North Carolina State University, USA, Prof. Dag Tjostheim, University Bergen, Norway, Prof. Xiangliang Zhang, KAUST, Saudi Arabia, Sylvia Rose Esterby, University of British Colombia, Canada, Prof. Abdel El-Shaarawi, Retired Professor at the National Water Research Institute, Canada. View Workshop schedule and abstracts here.
Prof. David Bolin, Statistics, KAUST
Thursday, November 14, 2019, 12:00
- 13:00
Building 9, Level 2, Hall 1, Room 2322
Contact Person
The talk will give an overview of some recent developments of statistical models based on stochastic partial differential equations. We will in particular focus on equations with non-local differential operators or non-Gaussian driving noise, and explain when any why such models are useful. As motivating applications, analysis of longitudinal medical data and ocean waves will be considered.