Trio of tuning tools for modeling large spatial datasets

The team used high-resolution soil moisture measurements from the Mississippi Basin to apply and test their approximation method.

© 2021 EyeTravel/ Shutterstock

Predictive modeling of very large datasets, such as environmental measurements, across a wide area can be a highly computationally intensive exercise. These computational demands can be significantly reduced by applying various approximations, but at what cost to accuracy? KAUST researchers have now developed statistical tools that help remove the guesswork from this approximation process.

“In spatial statistics, it is extremely time consuming to fit a standard process model to large datasets using the most accurate likelihood-based methods,” says Yiping Hong, who led the research. “Approximation methods can cut down the computation time and computing resources significantly.”

Rather than model the relationship between each pair of observations explicitly using a standard process model, approximation methods try to adopt an alternative modeling structure to describe the relationships in the data. This approach is less accurate but more computationally friendly. The tile low-rank (TLR) estimation method developed by KAUST, for example, applies a block-wise approximation to reduce the computational time.

“Thus, one needs to determine some tuning parameters, such as how many blocks should be split and the precision of the block approximation,” says Hong. “For this, we developed three criteria to assess the loss of prediction efficiency, or the loss of information, when the model is approximated.”

Read the full article

Related Persons

Postdoctoral Research Fellow, Statistics