Computation of High-Dimensional Multivariate Normal and Student-t Probabilities Based on Matrix Compression Schemes

-

KAUST

The thesis focuses on the computation of high-dimensional multivariate normal (MVN) and multivariate Student-t (MVT) probabilities. Firstly, a generalization of the conditioning method for MVN probabilities is proposed and combined with the hierarchical matrix representation. Next, I revisit the Quasi-Monte Carlo (QMC) method and improve the state-of-the-art QMC method for MVN probabilities with block reordering, resulting in a ten-time-speed improvement. The thesis proceeds to discuss a novel matrix compression scheme using Kronecker products. This novel matrix compression method has a memory footprint smaller than the hierarchical matrices by more than one order of magnitude. A Cholesky factorization algorithm is correspondingly designed and shown to accomplish the factorization in 1 million dimensions within 600 seconds. To make the computational methods for MVN probabilities more accessible, I introduce an R package that implements the methods developed in this thesis and show that the package is currently the most scalable package for computing MVN probabilities in R. Finally, as an application, I derive the posterior properties of the probit Gaussian random field and show that the R package I introduce makes the model selection and posterior prediction feasible in high dimensions.

Overview

Abstract

The first half of the thesis focuses on the computation of high-dimensional multivariate normal (MVN) and multivariate Student-t (MVT) probabilities. Chapter 2 generalizes the bivariate conditioning method to a d-dimensional conditioning method and combines it with a hierarchical representation of the n × n covariance matrix. The resulting two-level hierarchical-block conditioning method requires Monte Carlo simulations to be performed only in d dimensions, with d ≪ n, and allows the complexity of the algorithm’s major cost to be O(n log n). Chapter 3 improves the block reordering scheme from Chapter 2 and integrates it into the Quasi-Monte Carlo simulation under the tile-low-rank representation of the covariance matrix. Simulations up to dimension 65,536 suggest that this method can improve the run time by one order of magnitude compared with the hierarchical Monte Carlo method. The second half of the thesis discusses a novel matrix compression scheme with Kronecker products, a R package that implements the methods described in Chapter 3, and an application study with the probit Gaussian random field. Chapter 4 studies the potential of using the sum of Kronecker products (SKP) as a compressed covariance matrix representation. Experiments show that this new SKP representation can save the memory footprint by one order of magnitude compared with the hierarchical representation for covariance matrices from large grids and the Cholesky factorization in one million dimensions can be achieved within 600 seconds. In Chapter 5, I describe the R package that implements the methods in Chapter 3 and show how the package improves the accuracy of the computed excursion sets. Chapter 6 derives the posterior properties of the probit Gaussian random field, based on which model selection and posterior prediction are performed. With the tlrmvnmvt package, the computation becomes feasible in tens of thousands of dimensions, where the prediction errors are significantly reduced.

Brief Biography

Jian Cao is a Ph.D. candidate supervised by Prof. Marc Genton in the Statistics Program at King Abdullah University of Science and Technology. He received his Master of Finance from Shanghai Jiao Tong University in 2016 and his Bachelor in Applied Mathematics from the University of Science and Technology of China in 2014. His research interests include scientific computing, high-performance computing, spatial statistics, and sparse matrix operations.

Presenters