Sutton, Hannah

Resource type

Thesis

Thesis type

(Thesis) M.Sc

Date created

2022-11-21

Authors/Contributors

Author: Sutton, Hannah

Abstract

Random forests are often regarded as black-box machine learning models. They are sufficiently complex that they are not easily interpretable. This fact has inspired a variety of research into improving the interpretability of random forests, which is the focus of this thesis; specifically, we wish to capture dissimilarities between random forest trees using several comparison functions on the decision trees that comprise the random forest, allowing the structure of the random forest to be quantified. These include a phylogenetic metric designed for transmission trees, as well as others we developed that involve the count and location of variables in each tree, as well as the depths of the trees. This allows us to visualise an underlying grouping of the trees using a heatmap and hierarchical clustering, and analyze the predictive accuracy of the decision tree clusters. Finally we propose a method for generating random decision trees, which we then use to generate synthetic data using a small set of trees. We use the random forest trained on this data to determine which comparison functions are statistically significant and contribute to the overall clustering. Additionally, we investigate whether or not the random forest is capable of recovering the original trees that the data was created from.

Extent

69 pages.

Keywords

Identifier

etd22289

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Colijn, Caroline

Thesis advisor: Elliott, Lloyd

Language

English

Member of collection

Individualized Interdisciplinary Studies

Download file	Size
etd22289.pdf	1.13 MB

Quantifying Structure in Random Forests

Keywords

Views & downloads - as of June 2023