Events | STAT | Statistics

In this talk, I will describe the family of mean-mixtures of multivariate normal distributions and establish many of its properties, stochastic representations, moments, distributional shape characteristics, etc.

Models and Inference for Non-Gaussian Random Vectors and Fields

Sagnik Mondal

Ph.D. Student, Statistics

Sunday, November 12, 2023, 12:30

- 14:30

Building 5, Level 5, Room 5209

Contact Person

The multivariate Gaussian distribution is widely used in many statistical applications due to its appealing features. However, real-world data often violate its assumptions, showing skewness and/or tail-thickness.

Workshop

The KAUST 2023 Workshop on Statistics

Monday, November 06, 2023, 09:00

- 17:00

Building 18, Room 4203

Contact Person

Paula Moraga

The workshop will feature the latest research on statistical methods and modeling to address real-world challenges in health, environment, and sustainability.

Free Boundary Problems in Science and Engineering

Thursday, November 02, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

Free boundary problems arise naturally in a range of mathematical models that describe physical, biological or financial phenomena, such as the melting of ice into water, the dynamics of a population or the behavior of stock markets, to mention just a few.

Andre Victor Ribeiro Amaral

Spatial and Spatio-temporal Statistical Methods for Environment and Public Health Applications

PhD Student, Statistics

Thursday, November 02, 2023, 11:45

- 14:30

B9, L4, R4225

Contact Person

Andre Victor Ribeiro Amaral

This thesis proposes statistical spatial and spatial-temporal methods for addressing real-world challenges within the public health and environmental domains.

Spatio-temporal Bayesian analysis of excess mortality in 5 European countries in 2020

Prof. Virgilio Gómez-Rubio, Department of Mathematics, Universidad de Castilla-La Mancha, Spain

Thursday, October 26, 2023, 12:00

- 13:00

B9-L2-R2325

Abstract

The COVID-19 pandemic produced excess mortality in many countr

Latent Gaussian Spatial Modelling with Change of Support

Erick Chacon Montalvan, Postdoctoral fellow, Statistics Geohealth Group, KAUST

Thursday, October 19, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

Spatial data analysis commonly needs to deal with spatial data derived from multiple sources (e.g. satellites, stations, survey samples) with different supports, but associated with the same properties of a spatial phenomenon under interest. Usually, predictors are also measured on different spatial supports than the response variable.

Space–Time Multiscale Modeling and Numerics of Sorption Kinetics

Thursday, October 12, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

In this talk we propose and validate a Space Multiscale model for the description of particle diffusion in the presence of trapping boundaries. We start from a drift diffusion equation in which the drift term describes the effect of bubble traps, and it is simulated by the Lennard–Jones potential.

How many sample points do we need for least squares?

Thursday, October 05, 2023, 12:00

- 13:00

B9-L2-R2325

Abstract

The goal of the least squares method is to find the best linea

Decision Trees for Fault Diagnosis in Circuits and Switching Networks - 2023-09-28

Thursday, September 28, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

We study theoretical problems of fault diagnosis in circuits and switching networks, which are among the most fundamental models for computing Boolean functions.

Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data

Iris Ivy Gauran

Postdoctoral Research Fellow, Biostatistics

Thursday, September 21, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

Contact Person

Iris Ivy Gauran

Cross-validation is an algorithmic technique extensively used for estimating the prediction error, tuning the regularization parameter, and choosing between competing predictive rules.

Goodness-of-fit tests for multivariate skewed distributions based on the characteristic function

Maicon J. Karling

Postdoctoral Fellow, Statistics

Thursday, September 14, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

Contact Person

Maicon J. Karling

Goodness-of-fit tests determine how well a set of observed data fits a particular probability distribution. They can also show if some categorical variable follows a hypothesized family of distributions.

Spatial Models and Extreme-Value Methods for Wildfire Risk Assessment

Daniela Cisneros

PhD Student, Statistics

Monday, September 11, 2023, 16:00

- 17:00

B3, L5, R5220

Contact Person

Raphaël Huser

The statistical modeling of spatial and extreme events provides a framework for the development of techniques and models to describe natural phenomena in a variety of environmental, geoscience, and climate science applications. In a changing climate, various natural hazards, such as wildfires, are believed to have evolved in frequency, size, and spatial extent, although regional responses may vary.

From Euler flows with friction to gradient flows and applications

Thursday, September 07, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

I will review some works on the high-friction limit (or small mass approximation) from Euler flows to advection-diffusion systems that are gradient flows, and related asymptotic problems in fluid mechanics. The formulation exploits the variational structure of compressible Euler flows and is connected to the interpretation of nonlinear Fokker-Planck systems as gradient flows in Wasserstein distance.

Efficient Intensity Estimation Using Adaptive Bandwidths on Linear Networks: A Study and Application in Traffic Accident Analysis

Jonatan A. González

Postdoctoral Fellows, Statistics

Thursday, August 31, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

Contact Person

Jonatan A. González

Estimating first-order intensity functions is crucial in the analysis of point patterns on linear networks, but selecting suitable bandwidths for non-parametric methods remains challenging. We propose an adaptive intensity estimator for the heating kernel that adjusts bandwidths based on data points, a novel approach in this context.

Approximate Bayesian inference based on dense matrices and new features using INLA

Sunday, June 04, 2023, 15:00

- 16:00

B4, L5, R5220

Bayesian computational statistics

Contact Person

Haavard Rue

The Integrated Nested Laplace Approximations (INLA) method has become a commonly used tool for researchers and practitioners to perform approximate Bayesian inference for various fields of applications. It has become essential to incorporate more complex models and expand the method’s capabilities with more features. In this dissertation, we contribute to the INLA method in different aspects.

Seminar

Stochastic environmental modeling in a time of convergence: physics meets artificial intelligence

Prof. Stefano Castruccio, Associate professor, University of Notre Dame, USA

Sunday, June 04, 2023, 10:00

- 11:00

Building 1, Level 4, Room 4102

Contact Person

It is widely acknowledged how the relentless surge of Volume, Velocity and Variety of data, as well as the simultaneous increase of computational resources have stimulated the development of data-driven methods with unprecedented flexibility and predictive power. However, not every environmental study entails a large data set: many applications ranging from astronomy or paleo-climatology have a high associated sampling cost and are instead constrained by physics-informed partial differential equations. Throughout the past few years, a new and powerful paradigm has emerged in the machine learning literature, merging data-driven and physics-informed problems, hence providing a unified framework for a whole spectrum of problems ranging from data-rich/context-poor to data-poor/context-rich. In this talk, I will present this new framework and discuss some of the most recent efforts to reformulate it as a stochastic model-based approach, thereby allowing calibrated uncertainty quantification.

Leave-Group-Out Cross-Validation for Latent Gaussian Models

Zhedong Liu

PhD Student, Statistics

Tuesday, May 30, 2023, 15:30

- 17:30

B1, R4102;

Contact Person

Haavard Rue

The commonly used leave-one-out and K-fold cross-validation methods are not suitable for structured models with multiple prediction tasks. To overcome this limitation, we introduce leave-group-out cross-validation, which allows groups to adapt to different tasks. We propose an automatic group construction method and provide an efficient approximation for latent Gaussian models. Moreover, this method is conveniently implemented in the R-INLA software.

Dean's Distinguished Lecture

Criticism and robustification of latent Gaussian models

Rafael Medeiros Cabral

PhD Student, Statistics

Sunday, May 28, 2023, 15:00

- 16:00

B1, L4, R4102

latent Gaussian models

Contact Person

Haavard Rue

Latent Gaussian models (LGM) are widely used but struggle with certain datasets that contain non-Gaussian features, such as sudden jumps or spikes. This dissertation aims to provide tools for researchers to check the adequacy of the fitted LGM (criticism); if the check fails, offer efficient and user-friendly implementations of latent non-Gaussian models, which lead to more robust inferences (robustification).

KAUST-CEMSE-STAT-AlKindi-Peter-Rousseeuw

New Graphical Displays for Classification

Peter Rousseeuw, Professor Emeritus, Statistics and Data Science, KU Leuven, Belgium

Tuesday, May 09, 2023, 15:00

- 16:00

Building 9, Level 2, Room 2325

Contact Person

Dean's Distinguished Lecture

Classification is a major tool of statistics and machine learning. Several classifiers have interesting visualizations of their inner workings. Here we pursue a different goal, which is to visualize the cases being classified, either in training data or in test data. An important aspect is whether a case has been classified to its given class (label) or whether the classifier wants to assign it to a different class. This is reflected in the probability of the alternative class (PAC). A high PAC indicates label bias, i.e. the possibility that the case was mislabeled. The PAC is used to construct a silhouette plot which is similar in spirit to the silhouette plot for cluster analysis. The average silhouette width can be used to compare different classifications of the same dataset. We will also draw quasi residual plots of the PAC versus a data feature, which may lead to more insight in the data. One of these data features is how far each case lies from its given class, yielding so-called class maps. The proposed displays are constructed for discriminant analysis, k-nearest neighbors, support vector machines, CART, random forests, and neural networks. The graphical displays are illustrated and interpreted on data sets containing images, mixed features, and texts.

Detecting Cellwise Outliers in Your Data

Peter Rousseeuw, Professor Emeritus, Statistics and Data Science, KU Leuven, Belgium

Tuesday, May 09, 2023, 12:00

- 13:00

Building 9, Level 2, Room 2325

Contact Person

A multivariate dataset consists of n cases in d dimensions, and is often stored in an n by d data matrix. It is well-known that real data may contain outliers. Depending on the situation, outliers may be (a) undesirable errors which can adversely affect the data analysis, or (b) valuable nuggets of unexpected information. In statistics and data analysis the word outlier usually refers to a row of the data matrix, and the methods to detect such outliers only work when at least half the rows are clean. But often many rows have a few contaminated cell values, which may not be visible by looking at each variable (column) separately. We describe a method to detect deviating data cells in a multivariate sample which takes the correlations between the variables into account. It has no restriction on the number of clean rows, and can deal with high dimensions. Other advantages are that it provides predicted values of the outlying cells, while imputing missing values at the same time. We illustrate the method on several real data sets, where it uncovers more structure than found by purely columnwise methods or purely rowwise methods. The proposed method can help to diagnose why a certain row is outlying, e.g. in process control. It also serves as an initial step for estimating multivariate location and scatter matrices, and for cellwise robust principal component analysis.

Sub-dimensional Mardia measures of multivariate skewness and kurtosis

Joydeep Chowdhury

Postdoctoral Fellow, Statistics

Monday, May 01, 2023, 12:00

- 13:00

Building 9, Level 3, Room 3128

The Mardia measures of multivariate skewness and kurtosis summarize the respective characteristics of a multivariate distribution with two numbers. However, these measures do not reflect the sub-dimensional features of the distribution. Consequently, testing procedures based on these measures may fail to detect skewness or kurtosis present in a sub-dimension of the multivariate distribution. We introduce sub-dimensional Mardia measures of multivariate skewness and kurtosis, and investigate the information they convey about all sub-dimensional distributions of some symmetric and skewed families of multivariate distributions.

Modeling and Inference for Multivariate Time Series, with Applications to Integer-Valued Processes and Nonstationary Extreme Data

Matheus B. Guerrero

PhD Student, Statistics

Tuesday, April 04, 2023, 16:00

- 19:00

B4, L5, R5220

Contact Person

Raphaël Huser

This Ph.D. research focuses on proposing new statistical methods for two types of time series data: integer-valued data and multivariate nonstationary extreme data. For the former, the researcher proposes a novel approach to building an integer-valued autoregressive (INAR) model that offers the flexibility to specify both marginal and innovation distributions, leading to several new INAR processes. For the latter, the researcher proposes new extreme value theory methods for analyzing multivariate nonstationary extreme data, specifically EEG recordings from patients with epilepsy. Two extreme-value methods, Conex-Connect and Club Exco, are proposed to study alterations in the brain network during extreme events such as epileptic seizures.