Bayesian profile regression with evaluation on simulated data

Date created: 
Bayesian mixture model
Dirichlet process
Profile regression

Using regression analysis to make inference using data sets that contain a large number of potentially correlated covariates can be difficult. This large number of covariates have become more common in clinical observational studies due to the dramatic improvement in information capturing technology for clinical databases. For instance, in disease diagnosis and treatment, obtaining a number of indicators regarding patients’ organ function is much easier than before and these indicators can be highly correlated. We discuss Bayesian profile regression, an approach that deals with the large numbers of correlated covariates for the binary covariates commonly recorded in clinical databases. Clusters of patients with similar covariate profiles are formed through the application of a Dirichlet process prior and then associated with outcomes via a regression model. Methods for evaluating the clustering and making inference are described afterwards. We use simulated data to compare the performance of Bayesian profile regression to the LASSO, a popular alternative for data sets with a large number of predictors. To make these comparisons, we apply the recently developed R package PReMiuM, to fit the Bayesian profile regression.

Document type: 
Graduating extended essay / Research project
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Jinko Graham
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.