Predictive Performance of Regression and Machine Learning Models under Correlated Predictor Structures: A Monte Carlo Study
Wunukhen Agbu Hosea
Department of Statistics, University of Nigeria, Nsukka, Enugu State, Nigeria.
Abimibola Victoria Oladugba
Department of Statistics, University of Nigeria, Nsukka, Enugu State, Nigeria.
Uchenna Chinedu Nduka *
Department of Statistics, University of Nigeria, Nsukka, Enugu State, Nigeria.
*Author to whom correspondence should be addressed.
Abstract
Multicollinearity is widely recognized for its adverse effects on parameter estimation and statistical inference in regression analysis; however, its influence on predictive performance remains less established, particularly for machine learning methods. This study investigated the effect of multicollinearity on the predictive performance of classical regression and machine learning models using a Monte Carlo simulation approach. Seven predictive methods were considered, namely Linear Regression (LR), Stepwise Linear Regression (SLR), Least Absolute Shrinkage and Selection Operator (LASSO), Regression Trees (RT), Random Forests (RF), Multivariate Adaptive Regression Splines (MARS), and Artificial Neural Networks (ANN). Predictor variables were generated from multivariate normal distributions under nine correlation structures representing weak, moderate, and strong multicollinearity conditions. For each simulation scenario, 1,000 datasets of size 500 were generated and partitioned into training and testing datasets. Predictive performance was evaluated using Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). The results showed that the predictive methods generally maintained stable performance across the varying multicollinearity scenarios, with no substantial decline in prediction accuracy as predictor correlation increased. Although slight variations were observed across methods, LR, LASSO, RF, MARS, and ANN consistently demonstrated competitive predictive performance under correlated predictor structures.
Keywords: Multicollinearity, monte carlo simulation, machine learning, predictive performance, regression models.