Predictive Performance Test based on the Exhaustive Nested Cross-Validation for High-dimensional data
Cross-validation is an algorithmic technique extensively used for estimating the prediction error, tuning the regularization parameter, and choosing between competing predictive rules.
Overview
Abstract
Cross-validation is an algorithmic technique extensively used for estimating the prediction error, tuning the regularization parameter, and choosing between competing predictive rules. However, its behavior is non-trivial because of various complex factors at play. In this study, we develop a test based on the exhaustive nested cross-validation procedure that is straightforward to apply, works almost automatically in many settings with a continuous response, and does not require assumptions about the underlying data distribution. In addition, our proposed method can provide valid confidence intervals for the difference in prediction error between two model-fitting algorithms. We circumvent computational complexity issues by deriving a computationally efficient expression for the cross-validation estimator. We present our findings on strategies for improving the statistical power in high-dimensional scenarios while maintaining the Type I error rate. The application of the proposed method to an RNA sequencing study and biological data is also presented.
Brief Biography
Iris Ivy M. Gauran is a postdoctoral research fellow at King Abdullah University of Science and Technology (KAUST) in the Biostatistics group led by Prof. Hernando Ombao. She obtained her doctorate in Statistics at the University of Maryland, Baltimore County (UMBC), under the supervision of Dr. Junyong Park and Dr. DoHwan Park. Her research interests include high dimensional inference, Bayesian data analysis, biostatistics, multiple hypothesis testing, and meta-analysis.