In this project, we consider a simple new approach to variable selection in linear regression based on the Sum-of-Single-Effects model. The approach is particularly well-suited to big-data settings where variables are highly correlated and effects are sparse. The approach shares the computational simplicity and speed of traditional stepwise methods of variable selection in regression, but instead of selecting a single variable at each step, computes a distribution on variables that captures uncertainty in which variable to select. This uncertainty in variable selection is summarized conveniently by credible sets of variables with an attached probability for the entire set. To illustrate the approach, we apply it to a big-data problem in genetics.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Graham, Jinko
Member of collection