Scalable Methods for Multivariate Normal Probability Estimation with Applications in Confidence Region Detection, Transport Phenomena, and Parallel Computing Using RCOMPSs

Overview

Computing high-dimensional multivariate normal (MVN) probabilities is a recurring bottleneck in spatial statistics. Classical methods such as the Separation-Of-Variables (SOV) algorithm are reliable, but their dependence on dense Cholesky factorization leads to O(n^3) time and O(n^2) memory costs, where n is the problem dimension. As a result, methods that are statistically well understood often become impractical at the scales now common in environmental and geospatial applications. This thesis addresses that gap by combining high-performance computing, numerical approximation, and transport-based covariance modeling to make several important spatial procedures usable at larger scales.

First, we develop a parallel implementation of the SOV algorithm based on task-based scheduling and tile-based linear algebra. We then incorporate Tile Low-Rank (TLR) approximation to reduce cost while retaining the accuracy needed for confidence region detection. On synthetic and real wind-speed datasets, the proposed framework achieves up to 20X speedups over dense implementations.

Second, we study confidence regions for geostatistical excursion sets, whose construction requires repeated large covariance operations and conditional simulations. We reformulate the confidence-region algorithm as a collection of tile-based operations executed through the PaRSEC runtime, with GPU acceleration and mixed-precision arithmetic in the dominant kernels. The resulting implementation is substantially faster than the R-based baseline, reaching up to 33X speedups while matching the original method's statistical output in our experiments.

Third, we extend the Lagrangian framework for spatio-temporal covariance modeling by introducing random acceleration in addition to random velocity. A quadratic-form completion identity yields closed-form covariance and cross-covariance expressions for several transported Gaussian models. Simulation studies and an application to GOES-19 satellite cloud imagery show that acceleration can improve short-term prediction when transport paths bend or decelerate over time.

Finally, we introduce RCOMPSs, a task-based runtime system that brings parallel and distributed execution to R with limited changes to user code. Built on the COMPSs framework, RCOMPSs lets users annotate sequential R functions as tasks while the runtime handles dependency analysis, scheduling, and data movement. Experiments with k-means clustering, k-nearest neighbors, and linear regression show consistent speedups on both multicore and distributed platforms.

Taken together, these contributions show how advances in parallel computing and statistical modeling can be combined to widen the practical range of modern spatial methods, from probability computation and uncertainty quantification to transport-aware covariance modeling and scalable statistical software.

Presenters

Brief Biography