Evaluating the Impact of Heteroscedasticity on the Predictive Ability of Modern Regression Techniques

Author: 
Date created: 
2014-08-22
Identifier: 
etd8605
Keywords: 
Regression
Heteroscedasticity
Random forest
Multivariate adaptive regression splines
LASSO
BART
Regression tree
Data mining
Machine learning
Abstract: 

Over the last decade, the number and sophistication of methods used to do regression on complex datasets have increased substantially. Despite this, our literature review found that research that explores the impact of heteroscedasticity on many widely used modern regression methods appears to be sparse. Thus, our research seeks to clarify the impact that heteroscedasticity has on the predictive effectiveness of modern regression methods. In order to achieve this objective, we begin by analyzing the ability of ten different modern regression methods to predict outcomes for three medium-sized data sets that each feature heteroscedasticity. We then use insights provided from this work to develop a simulation model and design an experiment that explores the impact that various factors have on prediction accuracy of our ten different regression methods. These factors include linearity, sparsity, the signal to noise ratio, the number of explanatory variables, and the use of a variance stabilizing transformation.

Document type: 
Graduating extended essay / Research project
Rights: 
Copyright remains with the author. The author granted permission for the file to be printed and for the text to be copied and pasted.
File(s): 
Supervisor(s): 
Tom Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.
Statistics: