Skip to main content

Penalized likelihood methods for sparse datasets, with applications to genetic epidemiology

Resource type
Thesis type
(Thesis) Ph.D.
Date created
Author: Yu, Ying
Increasingly, logistic regression methods for genetic association studies of binary phenotypes must be able to accommodate data sparsity, which arises from unbalanced case-control ratios and/or rare exposures. Sparseness leads to maximum likelihood estimates (MLEs) of log odds-ratio parameters that are biased away from their null value of zero and tests with inflated type I errors. Different penalized-likelihood methods have been developed to mitigate sparse-data bias. We study penalized logistic and conditional regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. The thesis is organized in three parts. First, we propose a two-step methodology for implementing log-F penalization for inference of regression parameters from logistic regression, with application to genome-wide association studies. In the first step we estimate the shrinkage parameter, and in the second step we use the penalized regression estimator to estimate single-variant associations across the genome. Next, we explore log-F penalization for inference of regression parameters from conditional logistic regression, with application to data from matched case-control and case-parent trio studies. In the first two projects we use simulation to study the statistical properties of our methods and make comparisons to methods that use Firth penalization. Finally, we apply log-F-penalized logistic regression to data from the UK Biobank, to investigate the method's feasibility for genome-wide, biobank-scale data. The complexity and size of biobank data present unique challenges, and we make modifications to our methodology to increase its flexibility and adaptability to such datasets.
88 pages.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: McNeney, Brad
Download file Size
etd22690.pdf 5.9 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 1