Dataset size counts for better predictions

The model was applied to a spatial dataset of the Mississippi River basin in the United States to improve understanding of hydrological processes and climate variability.
© Ian Dagnall / Alamy Stock Photo

A new statistical tool for modeling large climate and environmental datasets that has broad applications—from weather forecasting to flood warning and irrigation management—has been developed by researchers at KAUST.

Climate and environmental datasets are often very large and contain measurements taken across many locations and over long periods. Their large sample sizes and high dimensionality introduce significant statistical and computational challenges. Gaussian process models used in spatial statistics, for example, face considerable difficulty due to the prohibitive computational burden and rely on subsamples or analyze spatial data region by region.

Ying Sun and her PhD student Huang Huang developed a new method that uses a hierarchical low-rank approximation scheme to resolve the computational burden, providing an efficient tool for fitting Gaussian process models to datasets that contain large quantities of climate and environmental measurements.

“One advantage of our method is that we apply the low-rank approximation hierarchically when fitting the Gaussian process model, which makes analyzing large spatial datasets possible without excessive computation,” explains Huang. “The challenge, however, is to retain estimation accuracy by using a computationally efficient approximation.”

Read the full article