Heteroscedastic BART Using Multiplicative Regression Trees

Abstract

Bayesian additive regression trees (BART) has become increasingly popular as a flexible and scalable non-parametric model useful in many modern applied statistics regression problems. It brings many advantages to the practitioner dealing with large and complex non-linear response surfaces, such as a matrix-free formulation and the lack of a requirement to specify a regression basis a priori. However, while flexible in fitting the mean, the basic BART model relies on the standard i.i.d. normal model for the errors. This assumption is unrealistic in many applications. Moreover, in many applied problems understanding the relationship between the variance and predictors can be just as important as that of the mean model. We develop a novel heteroscedastic BART model to alleviate these concerns. Our approach is entirely non-parametric and does not rely on an a priori basis for the variance model. In BART, the conditional mean is modeled as a sum of trees, each of which determines a contribution to the overall mean. In this talk, we model the conditional variance with a product of trees, each of which determines a contribution to the overall variance. We implement the approach and demonstrate it on a simple low-dimensional simulated dataset, a higher-dimensional dataset of used car prices, a fisheries dataset and data from an alcohol consumption study.

Brief Biography

Matthew Pratola’s research is focused on two areas of statistical methodology: (1) statistical models for computationally scalable and flexible Bayesian non-parametric regression models for high-dimensional ``big data’’; and (2) statistical models for calibrating complex simulation models to real-world observations for parameter estimation, prediction and uncertainty quantification. In the first area, he has worked on Bayesian regression tree models having high practical impact due to their flexible modeling of complex non-linear response surfaces and their ease of interpretability. His work in this area has focused on the practical need for increasing the computational scalability, improving the Markov Chain Monte Carlo fitting algorithm that searches the space of trees, and introducing a flexible non-parametric heteroscedastic modeling approach to handle large heterogeneous datasets. In the second area, he has developed novel approaches for calibrating simulation models exhibiting non-stationarity, combined design of experiments ideas with calibration experiments for efficient sequential computation of simulators and developed methodology for non-deterministic simulators. On a more personal note, Dr. Pratola’s pastimes include cycling, yoga, music, reading and food.