An applied analysis of high-dimensional logistic regression

Author: 
Date created: 
2017-05-16
Identifier: 
etd10164
Keywords: 
High dimensional
Logistic regression
Lasso
Elastic net
Significance test
Abstract: 

In the high dimensional setting, we investigate common regularization approaches for fitting logistic regression models with binary response variables. A literature review is provided on generalized linear models, regularization approaches which include the lasso, ridge, elastic net and relaxed lasso, and recent post-selection methods for obtaining p-values of coefficient estimates proposed by Lockhart et. al. and Buhlmann et. al. We consider varying n, p conditions, and assess model performance based on several evaluation metrics - such as their sparsity, accuracy and algorithmic time efficiency. Through a simulation study, we find that Buhlmann et. al’s multi sample splitting method performed poorly when selected covariates were highly correlated. When λ was chosen through cross validation, the elastic net had similar levels of performance as compared to the lasso, but it did not possess the level of sparsity Zou and Hastie have suggested.

Document type: 
Graduating extended essay / Research project
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Senior supervisor: 
Richard Lockhart
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.
Statistics: