An efficient approach to pruning regression trees using a modified Bayesian information criterion

Author: 
Date created: 
2021-04-14
Identifier: 
etd21304
Keywords: 
Regression trees
Pruning
Information criteria
Cross-validation
Machine learning
Abstract: 

By identifying relationships between regression tree construction and change-point detection, we show that it is possible to prune a regression tree efficiently using properly modified information criteria. We prove that one of the proposed pruning approaches that uses a modified Bayesian information criterion consistently recovers the true tree structure provided that the true regression function can be represented as a subtree of a full tree. In practice, we obtain simplified trees that can have prediction accuracy comparable to trees obtained using standard cost-complexity pruning. We briefly discuss an extension to random forests that prunes trees adaptively in order to prevent excessive variance, building upon the work of other authors.

Document type: 
Graduating extended essay / Research project
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Supervisor(s): 
Thomas Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.
Statistics: