There has been an increasing demand for describing, predicting, and drawing inferences for various environmental processes, such as air pollution and precipitation. Environmental statistics plays an important role in many related applications, such as weather-related risk assessment for urban design and crop growth. However, modeling the spatio-temporal dynamics of environmental data is challenging due to their inherent high variability and nonstationarity. This dissertation is composed of four significant contributions to the modeling, simulation, and prediction of spatio-temporal processes using statistical techniques and machine learning algorithms. This dissertation firstly focuses on the Gaussian process emulators of the numerical climate models over a large spatial region, where the spatial process exhibits nonstationarity. The proposed method allows for estimating a rich class of nonstationary Matérn covariance functions with spatially varying parameters. The efficient estimation is achieved by local-polynomial fitting of the covariance parameters. To extend the applicability of this method to large-scale computations, the proposed method is implemented by developing software with high-performance computing architectures for nonstationary Gaussian process estimation and simulation. The developed software outperforms existing ones in both computational time and accuracy by a large margin. The method and software are applied to the statistical emulation of high-resolution climate models. The second focus of this dissertation is the development of spatio-temporal stochastic weather generators for non-Gaussian and nonstationary processes. The proposed multi-site generator uses a left-censored non-Gaussian vector autoregression model, where the random error follows a skew-symmetric distribution. It not only drives the occurrence and intensity simultaneously but also possesses nice interpretations both physically and statistically. The generator is applied to 30-second precipitation data collected at the University of Lausanne. Finally, this dissertation investigates the spatial prediction with scalable deep learning algorithms to overcome the limitations of the classical Kriging predictor in geostatistics. A novel neural network structure is proposed for spatial prediction by adding an embedding layer of spatial coordinates with basis functions. The proposed method, called DeepKriging, has multiple advantages over Kriging and classical neural networks with spatial coordinates as features. The method is applied to the prediction of fine particulate matter (PM2.5) concentrations in the United States.
Yuxiao Li is a Ph.D. candidate supervised by Prof. Ying Sun in the Statistics Program at King Abdullah University of Science and Technology. He received his M.S. in statistics from the University of California, Irvine and B.S. in applied mathematics from Beijing Institute of Technology at China. His research interests lie in the fields of computational methods for large datasets, deep learning, environmental statistics, spatio-temporal statistics, and stochastic weather generator.