Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Differences in Prescription Drug use Among 5-year Survivors of Childhood, Adolescent, and Young Adult Cancer and the General Population in British Columbia, Canada

Author: 
Date created: 
2017-07-13
Abstract: 

In this project, we analyze the prescription drug use of childhood, adolescent, and young adult cancer survivors identified by the CAYACS program in BC. Understanding the patterns of prescription use and factors associated with the tendency to be on prescriptions is important to policy and health care planners. Since data on actual prescription usage are not available, we use prescription dispensing data as a proxy. We examine the differences in prescription use between survivors and matched controls selected from the general population, and assess the impact of age and other clinical and sociodemographic factors on prescription use. Specifically, we model subjects' on-/off-prescription status by a first-order Markov transition model, and handle the between-subject heterogeneity using a random effect. Our method captures the differences in prescription drug use between survivors and the general population, as well as differences within the survivor population. Our results show that survivors tend to exhibit a higher probability of going on prescriptions compared to the general population over the course of their lifetime. Further, females appear to have a higher probability of going on prescriptions than males over the course of their lifetime. A simulation study is conducted to assess the performance of the estimators of the model.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Distributions of Time to First Spot Fire

Author: 
Date created: 
2017-08-15
Abstract: 

In wildfire management, a spot fire is the result of an airborne ember igniting a separate fire away from the main wildfire. Under certain environmental and wildfire conditions, a burning ember can breach a fuel break, such as a river or road, and result in the production of a spot fire. This project derives distributions of the time to the first spot fire in various situations, and verifies them by simulation. To demonstrate the implementation of the distributions in practice, we incorporate a stochastic fire spread model. This research assesses the likelihood of spot fire occurring passed a fuel break, all while taking into account both spotting distance and spotting rate. This contrasts with the traditional approach that solely involves the maximal spotting distance, and can be a tool for fire management.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Mendelian randomization for causal inference of the relationship between obesity and 28-day survival following septic shock

Date created: 
2017-08-10
Abstract: 

Septic shock is a leading cause of death in intensive care units. Septic shock occurs when a body-wide infection leads to low blood pressure, and ultimately organ failure. Some recent studies suggest that overweight and obese patients have a better chance of survival following septic shock than normal or underweight patients. In this project we apply Mendelian randomization to assess whether the observed obesity effect on 28-day survival following septic shock is causal or more likely due to unmeasured confounding variables. Mendelian randomization is an instrumental variables approach that uses genetic markers as instruments. Under modelling assumptions, unconfounded estimates of the obesity effect can be obtained by fitting a model for 28-day survival that includes a residual obesity term. Data for the project comes from the Vasopressin and Septic Shock Trial (VASST). Our analysis suggests that the observed obesity effect on survival following septic shock is not causal.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Brad McNeney
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

A Multi-Dimensional Bühlmann Credibility Approach to Modeling Multi-Population Mortality Rates

Author: 
Date created: 
2017-06-08
Abstract: 

In this project, we first propose a multi-dimensional Bühlmann credibility approach to forecasting mortality rates for multiple populations, and then compare forecasting performances among the proposed approach and the joint-k/co-integrated/augmented common factor Lee-Carter models. The model is applied to mortality data of the Human Mortality Database for both genders of three well-developed countries with an age span and a wide range of fitting year spans. Empirical illustrations show that the proposed multi-dimensional Bühlmann credibility approach contributes to more accurate forecast results, measured by MAPE (mean absolute percentage error), than those based on the Lee-Carter model.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Cary Chi-Liang Tsai
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Using AI and Statistical Techniques to Correct Play-by-play Substitution Errors

Author: 
Date created: 
2017-05-26
Abstract: 

Play-by-play is an important data source for basketball analysis, particularly for leagues that cannot afford the infrastructure for collecting video tracking data; it enables advanced metrics like adjusted plus-minus and lineup analysis like With Or Without You (WOWY). However, this analysis is not possible unless all substitutions are recorded and are correct. In this paper we use six seasons of play-by-play from the Canadian university league to derive a framework for automated cleaning of play-by-play that is littered with substitution logging errors. These errors include missing substitutions, unequal number of players subbing in and out, substitution patterns of a player not alternating between in/out, and more. We define features to build a prediction model for identifying correct/incorrect recorded substitutions and outline a simple heuristic for player activity to use for inferring the players who were not accounted for in the substitutions. We define two performance measures for objectively quantifying the effectiveness of this framework. The play-by-play which results from the algorithm opens up a set of statistics that were not obtainable for the Canadian university league which improves their analytics capabilities; coaches can improve strategy leading to a more competitive product, and media can introduce modern statistics in their coverage to increase engagement from fans.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An applied analysis of high-dimensional logistic regression

Author: 
Date created: 
2017-05-16
Abstract: 

In the high dimensional setting, we investigate common regularization approaches for fitting logistic regression models with binary response variables. A literature review is provided on generalized linear models, regularization approaches which include the lasso, ridge, elastic net and relaxed lasso, and recent post-selection methods for obtaining p-values of coefficient estimates proposed by Lockhart et. al. and Buhlmann et. al. We consider varying n, p conditions, and assess model performance based on several evaluation metrics - such as their sparsity, accuracy and algorithmic time efficiency. Through a simulation study, we find that Buhlmann et. al’s multi sample splitting method performed poorly when selected covariates were highly correlated. When λ was chosen through cross validation, the elastic net had similar levels of performance as compared to the lasso, but it did not possess the level of sparsity Zou and Hastie have suggested.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Richard Lockhart
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Bayesian Sensitivity Analysis for Non-ignorable Missing Data in Longitudinal Studies

Author: 
Date created: 
2017-04-13
Abstract: 

The use of Bayesian statistical methods to handle missing data in biomedical studies has become popular in recent years. In this thesis, we propose a novel Bayesian sensitivity analysis (BSA) model that accounts for the influences of missing outcome data on the estimation of treatment effects in randomized control trials with non-ignorable missing data. We implement the method using the probabilistic programming language Stan, and apply it to data from the Vancouver At Home (VAH) Study, which is a randomized control trial that provided housing to homeless people with mental illness. We compare the results of BSA to those from an existing Bayesian longitudinal model that ignores missingness in the outcome. Furthermore, we demonstrate in a simulation study that, when a diffuse conservative prior that describes a range of assumptions about the bias effect is used, BSA credible intervals have greater length and higher coverage rate of the target parameters than existing methods, and that sensitivity increases as the percentage of missingness increases.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Lawrence McCandless
Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Delta Hedging for Single Premium Segregated Fund

Author: 
Date created: 
2017-03-31
Abstract: 

Segregated funds are individual insurance contracts that offer growth potential of investment in underlying assets while providing a guarantee to protect part of the money invested. The guarantee can cause significant losses to the insurer which makes it essential for the insurer to hedge this risk. In this project, we discuss the hedging effectiveness of delta hedging by studying the distribution of hedging errors under different assumptions about the return on underlying assets. We consider a Geometric Brownian motion and a Regime Switching Lognormal to model equity returns and compare the hedging effectiveness when risk-free rates are constant or stochastic. Two one-factor short-rate models, the Vasicek and CIR models, are used to model the risk-free rate. We find that delta hedging is in general effective but large hedging errors can occur when the assumptions of the Black-Scholes' framework are violated.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Gary Parker
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Analysis of Target Benefit Plans with Aggregate Cost Method

Author: 
Date created: 
2017-04-06
Abstract: 

The operational characteristics of a target benefit plan based on an aggregate pension cost method are studied through simulation under a multivariate time series model for projected interest rates and equity returns. The performance of the target benefit plan is evaluated by applying a variety of performance metrics for benefit security, benefit adequacy, benefit stability and intergenerational equity. Performance is shown to improve when the economy remains relatively stable over time and when the choice of valuation rate does not create persistent gains or losses.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Gary Parker
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Predictive Estimation in Canadian Federal Elections

Date created: 
2017-04-20
Abstract: 

Various estimation methods are employed to provide seat projections during Canadian federal elections. This project explores discrepancies between the real outcomes of recent Canadian federal elections and the predictions by the existing approaches such as the ones proposed by Grenier and Rosenthal. It appears that each seat projection procedure requires a set of assumptions, but the assumptions are not explicitly listed in the accessible references. We formulate the required assumptions used in the two prediction procedures proposed by Rosenthal, and present variance estimation procedures. Departures from the assumptions are explored with real data from the 2006, 2008, 2011, and 2015 federal election. An extensive simulation study is conducted to examine potential impacts of various deviations from the assumptions. The simulation indicates that, compared to other assumption violations, misleading polling results may cause the most damage to the prediction. In addition, we find by the simulation that the prediction is least affected by a change in number of voters and the heterogeneity of riding patterns within a region may not affect the the prediction at the national level.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.