The Statistics Program at KAUST is proud to host the** 2018 Workshop on Statistics and Data Science**. This workshop gathers the leading experts on statistical data science to discuss the current needs, challenges and opportunities of modeling massive and high dimensional data, predicting complex biological and physical processes.

The workshop will run from November 12-14. Talks and posters presentations will take place in Auditorium 0215 (between Buildings 4 & 5).

#### Contact Person

7:50

Walk from the lobby of KAUST Inn #1 with post-doc Carolina Euan

8:00 – 8:50

Coffee and pastries

8:50 - 9:00

Getting organized Hernando Ombao, Workshop Chair

9:00 - 9:15

Statistics Welcome Marc Genton, Statistics Program Chair

9:15 - 9:30

President’s Remarks President Tony Chan

9:30 - 9:55

Alumni Address Mohammed Alsobay, MS 2017

##### Session 1 Chair: Raphael Huser, KAUST

10:00 - 10:30

__Abstract:__

We develop a new family of variance reduced stochastic gradient descent methods for minimizing the average of a very large number of smooth functions. Our method---JacSketch---is motivated by novel developments in randomized numerical linear algebra, and operates by maintaining a stochastic estimate of a Jacobian matrix composed of the gradients of individual functions. In each iteration, JacSketch efficiently updates the Jacobian matrix by first obtaining a random linear measurement of the true Jacobian through (cheap) sketching, and then projecting the previous estimate onto the solution space of a linear matrix equation whose solutions are consistent with the measurement. The Jacobian estimate is then used to compute a variance-reduced unbiased estimator of the gradient, followed by a stochastic gradient descent step. Our strategy is analogous to the way quasi-Newton methods maintain an estimate of the Hessian, and hence our method can be seen as a {\em stochastic quasi-gradient method}. Indeed, quasi- Newton methods project the current Hessian estimate onto a solution space of a linear equation consistent with a certain linear (but non-random) measurement of the true Hessian. Our method can also be seen as stochastic gradient descent applied to a {\em controlled stochastic optimization reformulation} of the original problem, where the control comes from the Jacobian estimates. We prove that for smooth and strongly convex functions, JacSketch converges linearly with a meaningful rate dictated by a single convergence theorem which applies to general sketches. We also provide a refined convergence theorem which applies to a smaller class of sketches, featuring a novel proof technique based on a {\em stochastic Lyapunov function}. This enables us to obtain sharper complexity results for variants of JacSketch with importance sampling. By specializing our general approach to specific sketching strategies, JacSketch reduces to the celebrated stochastic average gradient (SAGA) method, and its several existing and many new minibatch, reduced memory, and importance sampling variants. Our rate for SAGA with importance sampling is the current best-known rate for this method, resolving a conjecture by Schmidt et al (2015). The rates we obtain for minibatch SAGA are also superior to existing rates. Moreover, we obtain the first minibatch SAGA method with importance sampling. This is joint work with Robert M. Gower (Telecom ParisTech) and Francis Bach (INRIA).

__Biography:__

Peter Richtarik is an Associate Professor of Computer Science and Mathematics at KAUST and an Associate Professor of Mathematics at the University of Edinburgh. He is an EPSRC Fellow in Mathematical Sciences, Fellow of the Alan Turing Institute, and is affiliated with the Visual Computing Center and the Extreme Computing Research Center at KAUST. Dr. Richtarik received his PhD from Cornell University in 2007, and then worked as a Postdoctoral Fellow in Louvain, Belgium, before joining Edinburgh in 2009, and KAUST in 2017. Dr. Richtarik's research interests lie at the intersection of mathematics, computer science, machine learning, optimization, and numerical linear algebra, high performance computing and applied probability. Through his recent work on randomized decomposition algorithms (such as randomized coordinate descent methods, stochastic gradient descent methods and their numerous extensions, improvements and variants), he has contributed to the foundations of the emerging field of big data optimization, randomized numerical linear algebra, and stochastic methods for empirical risk minimization. Several of his papers attracted international awards, including the SIAM SIGEST Best Paper Award, the IMA Leslie Fox Prize (2nd prize, twice), and the INFORMS Computing Society Best Student Paper Award (sole runner up). He is the founder and organizer of the Optimization and Big Data workshop series.

10:30 - 10:45

Coffee Break

10:45 - 11:15

__Abstract:__

Argo floats measure seawater temperature and salinity in the upper 2,000 m of the global ocean. Statistical analysis of the resulting spatio-temporal data set is challenging due to its nonstationary structure and large size. We propose mapping these data using locally stationary Gaussian process regression where covariance parameter estimation and spatio-temporal prediction are carried out in a moving-window fashion. This yields computationally tractable nonstationary anomaly fields without the need to explicitly model the nonstationary covariance structure. We also investigate Student-t distributed fine-scale variation as a means to account for non-Gaussian heavy tails in ocean temperature data. Cross-validation studies comparing the proposed approach with the existing state-of-the-art demonstrate clear improvements in point predictions and show that accounting for the no stationarity and non-Gaussianity is crucial for obtaining well- calibrated uncertainties. These techniques can ultimately be used to obtain improved estimates of ocean climate and dynamics. As an example, I will briefly describe ongoing work on applying these methods to estimating the heat content of the global ocean, a quantity that is of central importance for understanding changes in the Earth's climate system.

__Biography:__

Dr. Mikael Kuusela is an Assistant Professor of Statistics and Data Science at Carnegie Mellon University where he develops statistical methods for analyzing large and complex data sets in the physical sciences. His recent work has had two focal points: 1) developing spatio-temporal interpolation methods for analyzing oceanographic data from Argo profiling floats and 2) uncertainty quantification in ill-posed inverse problems with applications to the unfolding problem at the Large Hadron Collider at CERN. He obtained his PhD in Statistics in July 2016 from EPFL in Lausanne, Switzerland. He then moved to the US where he was a postdoc at the University of Chicago and at SAMSI before joining Carnegie Mellon in August 2018. His BSc and MSc degrees are in Engineering Physics and Mathematics from Aalto University in Helsinki, Finland.

11:15 - 11:45

__Abstract:__

We present a statistical space-time characterization of the sub-grid variability of air-sea fluxes. Indeed many physical phenomena happen at scales finer than the discretization one and these interact with the resolved scales. Hence, quantifying the influence of the sub-grid scales on the resolved one is needed to better represent the resolved scales in the entire system. We evaluate the difference between the true turbulent fluxes and those calculated using area-averaged wind speeds. We investigate a space-time characterization of this discrepancy, conditioned on the low resolution fields, with the view of developing a stochastic wind-flux parameterization. A locally- stationary space-time model is used to characterize and model this error process. The study is performed on high resolution simulation on a large domain that extends across the Indian Ocean into the West Pacific.

__Biography:__

Julie Bessac received the B.Sc. degree in fundamental Mathematics and the M.S. degree in Probability and Statistics, respectively in 2008 and 2011 from the University of Rennes 1, Rennes, France. She received the Ph.D. degree in 2014 in applied Mathematics from the University of Rennes 1, Rennes, France. Between November 2014 and July 2017, she was a post-doctoral appointee in the Mathematics and Computer Science Division at Argonne National Laboratory, Argonne, IL. Since July 2017, she has been an assistant statistician at Argonne National Laboratory. Her research focuses on the statistical modeling and forecasting of weather data and simulations applied to renewable energy.

11:45 - 12:00

Discussion

12:00 - 1:30

Lunch break

##### Session 2 Chair: Marc Genton, KAUST

1:30 - 2:00

__Abstract:__

The permutation test is perhaps the most widely used nonparametric test procedure in brain imaging. It is known as the exact test in statistics since the distribution of the test statistic under the null hypothesis can be exactly computed if we can calculate all possible values of the test statistic under every possible permutation. Unfortunately, generating every possible permutation for large-scale brain image datasets such as HCP and ADNI with thousands images is not practical. Many previous attempts at speeding up permutation test rely on various approximation strategies such as estimating the tail distribution with a known distribution. In this study, we show how to rapidly accelerate the permutation test without any type of approximate strategies by exploiting the algebraic structure of the permutation group. The method is applied to large-scale twin imaging studies in determining the heritability of brain regions and connections.

__Biography:__

Moo K. Chung, Ph.D. is an Associate Professor in the Department of Biostatistics and Medical Informatics at the University of Wisconsin-Madison. He is also affiliated with the Department of Statistics and Waisman Laboratory for Brain Imaging and Behavior. Dr. Chung’s main research area is computational neuroimaging, where noninvasive brain imaging modalities such as magnetic resonance imaging (MRI), diffusion tensor imaging (DTI) and functional-MRI are used to map the spatiotemporal dynamics of the human brain. His research concentrates on the methodological development required for quantifying and contrasting functional and anatomical shape variations in both normal and clinical populations using various mathematical, statistical and computational techniques. Recently he won NIH Brain Initiative Award for scalable algorithm development for large-scale brain networks using persistent homology. His third book titled Brain Network Analysis will be published in early 2019 through Cambridge University Press.

2:00 - 2:30

__Abstract:__

Disentangling multi component nonstationary signals into coherent AM-FM modes is usually achieved by identifying « loud » time-frequency trajectories where energy is locally maximum. We will present here an alternative perspective that is based on « silent » points, namely spectrogram zeros. Based on a study of the statistics of such points in the case of white Gaussian noise, the rationale and the implementation of the new approach will be discussed, as well as an application to the characterization of actual gravitational wave chirps embedded in noise.

__Biography:__

Patrick Flandrin received the engineer degree from ICPI Lyon, France, in 1978, and the Doct.- Ing. And Docteur d’État degrees from INP Grenoble, France, in 1982 and 1987, respectively. He joined CNRS in 1982, where he is currently Research Director. Since 1991, he has been with the Signals, Systems and Physics Group, within the Physics Department at ENS de Lyon, France. He is currently President of GRETSI, the French Association for Signal and Image Processing. His research interests include mainly nonstationary signal processing (with emphasis on time- frequency and time-scale methods), scaling stochastic processes and complex systems. He authored two monographs in those areas, the most recent one being Explorations in Time- Frequency Analysis (Cambridge University Press, 2018). Dr. Flandrin was awarded the Philip Morris Scientific Prize in Mathematics (1991), the SPIE Wavelet Pioneer Award (2001), the Prix Michel Monpetit from the French Academy of Sciences (2001), the Silver Medal from CNRS (2010), and the Technical Achievement Award from the IEEE Signal Processing Society (2017). Past Distinguished Lecturer of the IEEE Signal Processing Society (2010-2011), he is a Fellow of the IEEE (2002) and of EURASIP (2009), and he has been elected member of the French Academy of Sciences in 2010.

2:30 - 3:00

Coffee break

3:00 - 3:30

__Abstract:__

In this talk I will show how joint estimates based on the multivariate generalized Pareto distribution can be more useful for risk handling than currently available one-dimensional estimates. The multivariate generalized Pareto distribution arises as the limit of a suitably normalized vector conditioned upon at least one component of that vector being extreme. Statistical modelling using multivariate generalized Pareto distributions constitutes the multivariate analogue of peaks over thresholds modelling with the univariate generalized Pareto distribution, as often used in hydrology, meteorology, finance, …. New tools which makes the multivariate generalized Pareto distribution practically useful include a construction device which provides a variety of new and existing parametric tail dependence models; censored likelihood estimation procedures; a threshold selection procedure; and several goodness-of-fit diagnostics. Examples include portfolio risk estimation, landslide risk, and prediction of extreme influenza epidemics. Joint work with Anna Kiriliouk, Johan Segers, Maud Thomas, and Jennifer Wadsworth.

__Biography:__

Holger Rootzén is a Professor of Mathematical Statistics at the Chalmers University of Technology, Gothenburg, Sweden. He is elected member of the Royal Swedish Academy of Sciences, adjunct member committee for awarding the Swedish National Bank’s Prize in Economic Sciences in Memory of Alfred Nobel; Associate Editor for Annals of Statistics, JASA, and Extremes; earlier Editor for the Scandinavian Journal of Statistics, Bernoulli, and Extremes. He leads a large Wallenberg project “Big Data and Big Systems”, and has published about 90 papers in international journals and a book which continues to be a highly cited classic. His WoS h-index is 22 and his Google Scholar h-index 35. His research is about random processes. High- dimensional statistics for extreme episodes, the “shape” of extreme episodes in non-differentiable Gaussian processes, and modelling microscopic structures of soft materials is at the center of interest right now. His research contributes to mitigation of the impact of extreme floods, windstorms, and heat waves caused by climate change; to risk handling in finance and insurance; to using naturalistic driving studies to prevent car crashes; and to design of pharmaceutical tablet coatings.

3:30 - 4:00

__Abstract:__

We present a novel approach for the analysis of multivariate case-control datausing Bayesian Inference in the context of disease mapping, where the spatial distribution of different types of cancers is analysed. In particular, we propose a log-Gaussian Cox process for each set of cases and the controls, which accounts for risk factors and includes a term to measure spatial residual variation. This new framework is applied to a dataset of three different types of cancer and a set of controls from Alcalá deHernares (Madrid, Spain). Covariates available include the distance to several polluting industries and our findings point to a risk increase due to the proximity to some of these industries.

__Biography:__

Virgilio Gómez-Rubio is Associate Professor in the Department of Mathematics, Universidad de Castilla-La Mancha (UCLM) in Spain. Prior to joining UCLM, he was Research Associate at the Department of Epidemiology and Biostatistics, Imperial College London (U.K.). Dr. Gómez-Rubio has developed and contributed to a number of packages for the R software on spatial data analysis and Bayesian inference. He is also co-author of Springer’s bestselling book ‘Applied Spatial Data Analysis with R’. He has given courses on spatial data analysis and small area estimation at international conferences and universities worldwide. Currently, his main research interests are in Bayesian inference, spatial statistics and computational statistics. He is leading a project on the analysis of multivariate data for disease mapping to develop novel models, computational tools and software for Bayesian inference of spatio-temporal models. He is also involved in a project with the VABAR research group at Univesitat de València (Spain) on the analysis of highly correlated data, where he is developing models for the analysis of spatio-temporal data.

4:00 - 4:30

Discussion

8:00 – 8:50

Coffee and pastries

##### Session 3 Chair: Hernando Ombao, KAUST

9:00 – 9:30

__Abstract:__

Polyadenylation is a critical step for gene expression regulation during the maturation of mRNA. An accurate and robust method for poly(A) signals (PAS) identification is not only desired for the purpose of better transcripts' end annotation, but can also help us gain a deeper insight of the underlying regulatory mechanism. In this talk, I will introduce our works on developing machine learning techniques to identify true poly(A) signals based on the information that is automatically extracted from their flanking genomic sequences. I will first introduce the traditional string kernel- based methods and our previous method that combines generative learning and discriminative learning together. I will then propose a robust, PAS motif agnostic, and highly interpretable and transferrable deep learning framework to tackle the problem. Finally, I will touch upon our ongoing efforts on modeling actual PAS usage by deep learning methods.

__Biography:__

Dr. Xin Gao is an associate professor of computer science in the Computer, Electrical and Mathematical Sciences and Engineering Division at King Abdullah University of Science and Technology (KAUST), Saudi Arabia. He is also a PI in the Computational Bioscience Research Center at KAUST and an adjunct faculty member at David R. Cheriton School of Computer Science at University of Waterloo, Canada. Prior to joining KAUST, he was a Lane Fellow at Lane Center for Computational Biology in School of Computer Science at Carnegie Mellon University, U.S.. He earned his bachelor degree in Computer Science in 2004 from Computer Science and Technology Department at Tsinghua University, China, and his Ph.D. degree in Computer Science in 2009 from David R. Cheriton School of Computer Science at University of Waterloo, Canada. Dr. Gao’s research interest lies at the intersection between computer science and biology. In the field of computer science, he is interested in developing machine learning theories and methodologies. In the field of bioinformatics, he group works on building computational models, developing machine learning techniques, and designing efficient and effective algorithms, to tackle key open problems along the path from biological sequence analysis, to 3D structure determination, to function annotation, and to understanding and controlling molecular behaviors in complex biological networks. He has co- authored more than 150 research articles in the fields of bioinformatics and machine learning.

9:30 – 10:00

__Abstract:__

Imaging genetics is an emerging field for the investigation of neuro- mechanisms linked to genetic variation. Although imaging genetics has recently shown great promise in understanding biological mechanisms for brain development and psychiatric disorders, studying the link between genetic variants and neuroimaging phenotypes remains statistically challenging due to the high-dimensionality of both genetic and neuroimaging data. This becomes even more challenging when studying G × E on neuroimaging phenotypes. In this talk, I introudce a set-based mixed effect model for gene- environment interaction (MixGE) on neuroimaging phenotype- s, such as structural volumes and tensor-based morphometry (TBM). This model incorporates both fixed and random effects of G × E to investigate homogeneous and heterogeneous contributions of multiple genetic variants and their interaction with environmental risks to phenotypes. We discuss the construction of score statistics for the terms associated with fixed and random effects of G×E to avoid direct parameter estimation in the MixGE model, which would greatly increase computational cost. I will discuss how the score statistics can be combined into a single significance value to increase statistical power.

__Biography:__

Dr. Qiu is Dean’s Chair Associate Professor at Department of Biomedical Engineering and Clinical Imaging Research Centre at National University of Singapore. She is also a principal investigator at Singapore Institute for Clinical Sciences of Agency for Science Technology and Research (A*STAR). Dr. Qiu has been devoted to innovation in computational analyses of complex and informative datasets comprising of disease phenotypes, neuroimages, and genetic data to understand the origins of individual differences in health throughout the lifespan. She received Faculty Young Research Award, 2016 Young Researcher Award of NUS. She has recently been appointed as endowed “Dean’s Chair” associate professorship to honour her outstanding research achievements. She serves on the HBM program committee and on several editorial boards.

10:00 – 10:30

Coffee Break

10:30- 11:00

__Abstract:__

A topic which is becoming more and more popular in Functional Data Analysis is local inference, i.e., the continuous statistical testing of a null hypothesis along the domain. The principal issue in this topic is the infinite number of tested hypotheses, which can be seen as an extreme case of the multiple comparisons problem. During the talk we will define and discuss the notion of Family- wise Error Rate (FWER) and False Discovery Rate (FDR) in the setting of functional data defined on a compact set. We will then introduce two procedures (i.e., the interval-wise testing and a continuous version of the Benjamini-Hochberg procedure) able to control the FWER and the FDR over the functional domain, respectively, and finally describe their inferential properties in terms of control of the Type-I error probability and of consistency. The proposed method will be applied to satellite measurements of Earth temperature with the aim of identifying the regions of the planet where temperature has significantly increased in the last decades.

__Biography:__

Simone Vantini is Associate Professor of Statistics at MOX (Modeling and Scientific Computing, Dept of Mathematics of Politecnico di Milano, Italy, since 2015. He received his MSc degree (cum laude) in Nuclear Engineering in 2004 and his PhD degree in Mathematical Engineering in 2008. He has published widely in Functional and High-dimensional Data Analysis with more than 40 publications indexed in Wos and/or Scopus. His current research interests include object-oriented data analysis, functional data analysis, high-dimensional data analysis, permutation testing, dimension reduction, blind source separation, risk analysis and in general statistical applications motivated by business or industrial problems. He has indeed continuously collaborated with many national and international research centers, institutions, and companies from the private sector.

11:00 – 11:30

__Abstract:__

The success of neural network models in image and speech processing has generated great interests in expanding their applications to business problems, an example of which is classification of customers for marketing campaigns in banking industry. However, the black-box nature of traditional neural networks limits their applicability. In recent years, new designs of neural network models with interpretability in mind have been proposed. In this paper, we report some results of our experimentation with such a model using a dataset of bank marketing operation. In addition, we consider a new training strategy with the aim of achieving the desired interpretability. Finally, we propose alternative architectures and regularization techniques to improve the interpretability and classification performance.

__Biography:__

Dr. Ta-Hsin Li is a Research Staff Member at IBM T. J. Watson Research Center in Yorktown Heights, NY. Dr. Li received the Ph.D. degree in applied mathematics from the University of Maryland, College Park, in 1992. Before joining IBM in 1999, he was on the faculty of the Statistics Department at Texas A&M University, College Station (1992–1997) and the Statistics and Applied Probability Department at the University of California, Santa Barbara (1998–2000). He was also an Adjunct Professor at Columbia University. His main research interests are time series analysis, statistical signal processing and statistical & AI methods for business applications. He serves as Associate Editor for Applied Stochastic Models in Business and Industry (2016–), Journal of Statistical Theory and Practice (2011–), EURASIP Journal on Applied Signal Processing (2006–), IEEE Transactions on Signal Processing (2000–2006, 2009–2012), and Technometrics (2013– 2016). Dr. Li is a Fellow of the American Statistical Association (ASA) and a Senior Member of the Institute of Electrical and Electronic Engineers (IEEE). He is the author of an upcoming book entitled Time Series with Mixed Spectra (CRC Press, 2013).

11:30 – 12:00

Discussion

12:00 – 1:30

Lunch Break

##### Session 4 Chair: Haavard Rue, KAUST

1:30 – 2:00

__Abstract:__

We develop unified theory and methodology for the inference of time-varying spectral densities for a general class of non-stationary and nonlinear processes. In particular, simultaneous confidence regions are constructed for the spectral densities on a nearly optimally dense grid of the time-frequency domain. A simulation based method is proposed to implement the simultaneous confidence regions. The simultaneous confidence regions serve as a unified and visually friendly tool for a wide range of problems in time-frequency analysis such as testing for stationarity and time-frequency separability and validation for non-stationary linear models.

__Biography:__

Zhou Zhou obtained his Ph.D. in Statistics from the University of Chicago in 2009. From 2009 to 2015, he was Assistant Professor of Statistics at University of Toronto. He was promoted to Associate Professor with tenure in 2015 and is currently the Associate Chair for Graduate Studies at the Department of Statistical Sciences, University of Toronto. Zhou’s major research interests lie in non-stationary time series analysis, non- and semi- parametric methods, change points analysis and functional and longitudinal data analysis.

2:00 – 2:30

__Abstract:__

Mediation analysis is an important tool in the behavioral sciences for investigating the role of intermediate variables that lie in the path between a treatment and an outcome variable. The influence of the intermediate variable on the outcome is often explored using a linear structural equation model (LSEM) with model coefficients interpreted as possible effects. While there has been significant research on the topic, little work has been done when the intermediate variable (mediator) is a high-dimensional vector. In this work we introduce a novel method for identifying potential mediators in this setting called the principal directions of mediation (PDMs). PDMs linearly combine potential mediators into a smaller number of orthogonal components, with components ranked by the proportion of the mediation effects each accounts for. We demonstrate the method using a functional magnetic resonance imaging (fMRI) study of thermal pain where we are interested in determining which brain locations mediate the relationship between the application of a thermal stimulus and self-reported pain.

__Biography:__

Martin Lindquist is a Professor of Biostatistics at Johns Hopkins University. His research focuses on mathematical and statistical problems relating to functional Magnetic Resonance Imaging (fMRI). Dr. Lindquist is actively involved in developing new analysis methods to enhance our ability to understand brain function using human neuroimaging. He has published over 70 articles, and serves on the editorial boards of several scientific journals in both statistics and neuroimaging. He is a fellow of the American Statistical Association. He was awarded the 2018 Organization for Human Brain Mapping’s “Education in Neuroimaging” award.

2:30 – 3:00

Coffee Break

3:00 - 3:30

__Abstract:__

We conduct a large dimensional study on discriminant analysis classifier with its three popular variants known as regularized LDA (R-LDA), regularized QDA (R-QDA) and regularized discriminant analysis (RDA). We start with the analysis of two special cases R-LDA and R-QDA, and finally generalize to RDA study. The analysis is based on the assumption that the data samples are drawn from a Gaussian mixture model with different means and covariances and relies on tools from random matrix theory (RMT). We consider the double asymptotic regime in which both the data dimension and training size with each class increases to infinity with fixed ratio. Under some mild assumptions, we show that the probability of misclassification error converges to a deterministic quantity which only depends on the class statistics and the data dimension. The result allows for a better understanding of the underlying classification algorithms in terms of their performances in practical large but finite dimensions. Further exploitation of the results permits to optimally tune the regularization parameters with the aim of minimizing the probability of misclassification error. The analysis is validated with numerical results involving synthetic as well as real data from the USPS dataset yielding a high accuracy in predicting the performances and hence making an interesting connection between theory and practice.

__Biography:__

Tareq Al-Naffouri received the B.S. degrees in mathematics and electrical engineering (with first honors) from King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia, the M.S. degree in electrical engineering from the Georgia Institute of Technology, Atlanta, in 1998, and the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA, in 2004. He was a visiting scholar at California Institute of Technology, Pasadena, CA in 2005 and summer 2006. He was a Fulbright scholar at the University of Southern California in 2008. He is currently an Associate at the Electrical Engineering Department, King Abdullah University of Science and Technology (KAUST). His research interests lie in the areas of sparse, adaptive, and statistical signal processing and their applications, localization, machine learning, and network information theory. He has over 240 publications in journal and conference proceedings, 9 standard contributions, 14 issued patents, and 8 pending.

3:30 – 4:00

__Abstract:__

Autoregressive Conditional Heteroscedasticity (ARCH) models, which were originally proposed by Engle (1982), have been playing critical roles in modelling volatilities in financial time series. This talk aims at a high-dimensional extension of ARCH models to evaluate volatility matrices for high-dimensional multivariate financial time series. The critical difficulty in the extension is in so called “curse of dimension” caused by larger number of parameters for higher dimension of multivariate series. We introduce financial distances among components of multivariate series, which are different from the usual geographical one but are based on closeness of financial conditions, and apply dynamic panel data models by spatial weight matrices constructed by the financial distance. As a result, we propose spatio-temporal GARCH models that can identify volatility matrices for high-dimensional financial time series. We conduct comparative studies by real financial time series and show empirical features of the spatio-temporal GARCH models in terms of forecast of volatilities.

__Biography:__

Yasumasa Matsuda, professor of statistics, Tohoku University, Japan, received Bachelor of math. In 1991. Master of stat. in 1994, and Ph.D in stat. in 1999 at Tokyo Institute of Technology, Japan. He is a Japanese statistician working on spatial and spatio-temporal modeling of empirical behaviors in social sciences. Originally he started his career as a time series statistician and had results on frequency domain analysis of time series. Recently he has interests in extensions of time series methodology to that on spatial data.

4:00 – 4:30

Discussion

8:00 – 8:50

##### Session 5 Chair: Ying Sun, KAUST

9:00 – 9:30

__Abstract:__

We apply a Gaussian semiparametric estimator that originally has been developed for estimation of stationary and nonstationary long memory time series to intrinsic stationary random fields. Then we derive its asymptotic properties and conduct some computational simulations to compare its performance with those of other estimation methods.

__Biography:__

I retired the University of Tokyo in Japan two years ago. Now I am a visiting professor of Tohoku University and a professor of emeritus of the University of Tokyo. I have been studying a statistical inference theory of time series analysis and random fields.

9:30 – 10:00

__Abstract:__

Here, we propose a spatio-temporal asymptotic framework for studying the properties of statistical inference for spatio-temporal models with possibly complex mean functions and possibly nonstationary covariance functions. In particular, a novel spatio-temporal expanding distance (STED) asymptotic framework is developed in a fixed spatio-temporal domain, providing a useful tool for studying spatio-temporal processes that are globally nonstationary in a rescaled fixed domain and locally stationary in a distance expanding domain. Different forms for the mean function are considered and the parameters are estimated by maximum likelihood estimation (MLE) or profile likelihood estimation (PLE). A simulation study is conducted and suggests sound empirical properties of our methods, while a health hazard data example further illustrates the methodology.

__Biography:__

Dr. Chu works as a Lecturer at School of Mathematics and Statistics, University of Melbourne. Before that, he worked as an Assistant Professor at Renmin University of China. His research area includes spatial statistics, spatio-temporal statistics and variable selection.

10:00 – 10:30

Coffee Break

10:30- 11:00

John Einmahl, Tilburg University, Netherlands Limits to human life span through extreme value theory

__Abstract:__

There is no scientific consensus on the fundamental question whether the probability distribution of the human life span has a finite endpoint or not and, if so, whether this upper limit changes over time. Our study uses a unique dataset of the ages at death - in days - of all (about 285,000) Dutch residents, born in the Netherlands, who died in the years 1986-2015 at a minimum age of 92 years and is based on extreme value theory, the coherent approach to research problems of this type. Unlike some other studies we base our analysis on the configuration of thousands of mortality data of old people, not just the few oldest old. We find compelling statistical evidence that there is indeed an upper limit to the life span of men and to that of women for all the 30 years we consider and, moreover, that there are no indications of trends in these upper limits over the last 30 years, despite the fact that the number of people reaching high age (say 95 years) was almost tripling. We also present estimates for the endpoints, for the force of mortality at very high age, and for the so-called perseverance parameter. This is joint work with Jesson Einmahl and Laurens de Haan.

__Biography:__

John H.J. Einmahl is professor of Statistics at the Department of Econometrics & OR at Tilburg University. He is a fellow of the Institute of Mathematical Statistics. John's research has been published in leading journals in Statistics and Probability Theory, like the Annals of Statistics, the Annals of Probability, JRSS B, and JASA. His research interests are mainly in the area of nonparametric statistics and its ramifications, including statistics of extremes, empirical likelihood, generalized and multivariate quantiles, and (local) empirical processes.

11:00 - 11:30

__Abstract:__

The National Resources Inventory (NRI) Survey is one of the largest annual longitudinal survey of soil, water, and related environmental resources in the US designed to assess conditions and trends on non-federal US lands. It was designed to provide accurate national and state estimates. One challenge in NRI is that there is a multi-year lag in publishing the NRI data due to resource constraints on data collection. We also receive requests from local stakeholders to provide data at county and small watershed level. In order to provide more timely estimates at smaller spatial scales, it is necessary to integrate alternative big data sources such as administrative data and satellite data with the survey data in our estimation. In this talk we give a brief introduction to the NRI, and share our experience using satellite data and machine learning methods to improve NRI estimation. New spatial-temporal functional imputation method for satellite data gap-filling and machine learning methods for satellite data based land-cover classification will be introduced, which are useful for the NRI small area estimation and forecasting.

__Biography:__

Zhengyuan Zhu is Professor of Statistics at Iowa State University and the director of the Center for Survey Statistics and Methodology. He received his Ph.D. degree in Statistics from the University of Chicago and was an Assistant Professor of Statistics at the University of North Carolina at Chapel Hill before joining Iowa State University in 2009. He has expertise in spatial statistics, survey statistics, spatial sampling design, and time series analysis, and is interested in applications in environmental statistics, remote sensing, natural resource surveys, and agricultural statistics. He is the Principle Investigator and co-Principle Investigator for a number of national large scale longitudinal surveys including the US National Resource Inventory survey, the US BLM-Managed Lands survey, and the surveys for the Conservation Effects Assessment Project.

11:30 – 11:45

Discussion

11:45 – 12:45

Lunch break

12:45 – 1:45

Visit to the Data Visualization Lab Meeting Place: Library Coffee

2:00 – 3:30

Poster Session #1 AUDITORIUM 0215 Note: poster set-up at 1 pm

4:00 – 5:30

Poster Session #2 AUDITORIUM 0215 Note: poster set-up at 3:30 pm

6:15 – 8:45

Conference Dinner @ AL MARSA Stroll or take the blue bus

8:25

Walk from KAUST Inn 2 to the harbor

9:00 - 1:00

Snorkeling