Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Estimating conditional intensity conditional function of a neural spike train by particle Markov chain Monte Carlo and smoothing

Author: 
Date created: 
2017-08-14
Abstract: 

Understanding neural activities is fundamental and challenging in decoding how the brain processes information. An essential part of the problem is to define a meaningful and quantitative characterization of neural activities when they are represented by a sequence of action potentials or a neural spike train. The thesis approaches to use a point process to represent a neural spike train, and such representation provides a conditional intensity function (CIF) to describe neural activities. The estimation procedure for CIF, including particle Markov Chain Monte Carlo (PMCMC) and smoothing, is introduced and applied to a real data set. From the CIF and its derivative of a neural spike train, we can successfully observe adaption behavior. Simulation study verifies that the estimation procedure provides reliable estimate of CIF. This framework provides a definite quantification of neural activities and facilitates further investigation of understanding the brain from neurological perspective.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Adjusting for Scorekeeper Bias in NBA Box Scores

Date created: 
2017-06-01
Abstract: 

Box score statistics in the National Basketball Association are used to measure and evaluate player performance. Some of these statistics are subjective in nature and since box score statistics are recorded by scorekeepers hired by the home team for each game, there exists potential for inconsistency and bias. These inconsistencies can have far reaching consequences, particularly with the rise in popularity of daily fantasy sports. Using box score data, we estimate models able to quantify both the bias and the generosity of each scorekeeper for two of the most subjective statistics: assists and blocks. We then use optical player tracking data for the 2015-2016 season to improve the assist model by including other contextual spatio-temporal variables such as time of possession, player locations, and distance traveled. From this model, we present results measuring the impact of the scorekeeper and of the other contextual variables on the probability of a pass being recorded as an assist. Results for adjusting season assist totals to remove scorekeeper influence are also presented.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Luke Bornn
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Differences in Prescription Drug use Among 5-year Survivors of Childhood, Adolescent, and Young Adult Cancer and the General Population in British Columbia, Canada

Author: 
Date created: 
2017-07-13
Abstract: 

In this project, we analyze the prescription drug use of childhood, adolescent, and young adult cancer survivors identified by the CAYACS program in BC. Understanding the patterns of prescription use and factors associated with the tendency to be on prescriptions is important to policy and health care planners. Since data on actual prescription usage are not available, we use prescription dispensing data as a proxy. We examine the differences in prescription use between survivors and matched controls selected from the general population, and assess the impact of age and other clinical and sociodemographic factors on prescription use. Specifically, we model subjects' on-/off-prescription status by a first-order Markov transition model, and handle the between-subject heterogeneity using a random effect. Our method captures the differences in prescription drug use between survivors and the general population, as well as differences within the survivor population. Our results show that survivors tend to exhibit a higher probability of going on prescriptions compared to the general population over the course of their lifetime. Further, females appear to have a higher probability of going on prescriptions than males over the course of their lifetime. A simulation study is conducted to assess the performance of the estimators of the model.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Distributions of Time to First Spot Fire

Author: 
Date created: 
2017-08-15
Abstract: 

In wildfire management, a spot fire is the result of an airborne ember igniting a separate fire away from the main wildfire. Under certain environmental and wildfire conditions, a burning ember can breach a fuel break, such as a river or road, and result in the production of a spot fire. This project derives distributions of the time to the first spot fire in various situations, and verifies them by simulation. To demonstrate the implementation of the distributions in practice, we incorporate a stochastic fire spread model. This research assesses the likelihood of spot fire occurring passed a fuel break, all while taking into account both spotting distance and spotting rate. This contrasts with the traditional approach that solely involves the maximal spotting distance, and can be a tool for fire management.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Mendelian randomization for causal inference of the relationship between obesity and 28-day survival following septic shock

Date created: 
2017-08-10
Abstract: 

Septic shock is a leading cause of death in intensive care units. Septic shock occurs when a body-wide infection leads to low blood pressure, and ultimately organ failure. Some recent studies suggest that overweight and obese patients have a better chance of survival following septic shock than normal or underweight patients. In this project we apply Mendelian randomization to assess whether the observed obesity effect on 28-day survival following septic shock is causal or more likely due to unmeasured confounding variables. Mendelian randomization is an instrumental variables approach that uses genetic markers as instruments. Under modelling assumptions, unconfounded estimates of the obesity effect can be obtained by fitting a model for 28-day survival that includes a residual obesity term. Data for the project comes from the Vasopressin and Septic Shock Trial (VASST). Our analysis suggests that the observed obesity effect on survival following septic shock is not causal.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Brad McNeney
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

A Multi-Dimensional Bühlmann Credibility Approach to Modeling Multi-Population Mortality Rates

Author: 
Date created: 
2017-06-08
Abstract: 

In this project, we first propose a multi-dimensional Bühlmann credibility approach to forecasting mortality rates for multiple populations, and then compare forecasting performances among the proposed approach and the joint-k/co-integrated/augmented common factor Lee-Carter models. The model is applied to mortality data of the Human Mortality Database for both genders of three well-developed countries with an age span and a wide range of fitting year spans. Empirical illustrations show that the proposed multi-dimensional Bühlmann credibility approach contributes to more accurate forecast results, measured by MAPE (mean absolute percentage error), than those based on the Lee-Carter model.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Cary Chi-Liang Tsai
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Using AI and Statistical Techniques to Correct Play-by-play Substitution Errors

Author: 
Date created: 
2017-05-26
Abstract: 

Play-by-play is an important data source for basketball analysis, particularly for leagues that cannot afford the infrastructure for collecting video tracking data; it enables advanced metrics like adjusted plus-minus and lineup analysis like With Or Without You (WOWY). However, this analysis is not possible unless all substitutions are recorded and are correct. In this paper we use six seasons of play-by-play from the Canadian university league to derive a framework for automated cleaning of play-by-play that is littered with substitution logging errors. These errors include missing substitutions, unequal number of players subbing in and out, substitution patterns of a player not alternating between in/out, and more. We define features to build a prediction model for identifying correct/incorrect recorded substitutions and outline a simple heuristic for player activity to use for inferring the players who were not accounted for in the substitutions. We define two performance measures for objectively quantifying the effectiveness of this framework. The play-by-play which results from the algorithm opens up a set of statistics that were not obtainable for the Canadian university league which improves their analytics capabilities; coaches can improve strategy leading to a more competitive product, and media can introduce modern statistics in their coverage to increase engagement from fans.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An applied analysis of high-dimensional logistic regression

Author: 
Date created: 
2017-05-16
Abstract: 

In the high dimensional setting, we investigate common regularization approaches for fitting logistic regression models with binary response variables. A literature review is provided on generalized linear models, regularization approaches which include the lasso, ridge, elastic net and relaxed lasso, and recent post-selection methods for obtaining p-values of coefficient estimates proposed by Lockhart et. al. and Buhlmann et. al. We consider varying n, p conditions, and assess model performance based on several evaluation metrics - such as their sparsity, accuracy and algorithmic time efficiency. Through a simulation study, we find that Buhlmann et. al’s multi sample splitting method performed poorly when selected covariates were highly correlated. When λ was chosen through cross validation, the elastic net had similar levels of performance as compared to the lasso, but it did not possess the level of sparsity Zou and Hastie have suggested.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Richard Lockhart
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Bayesian Sensitivity Analysis for Non-ignorable Missing Data in Longitudinal Studies

Author: 
Date created: 
2017-04-13
Abstract: 

The use of Bayesian statistical methods to handle missing data in biomedical studies has become popular in recent years. In this thesis, we propose a novel Bayesian sensitivity analysis (BSA) model that accounts for the influences of missing outcome data on the estimation of treatment effects in randomized control trials with non-ignorable missing data. We implement the method using the probabilistic programming language Stan, and apply it to data from the Vancouver At Home (VAH) Study, which is a randomized control trial that provided housing to homeless people with mental illness. We compare the results of BSA to those from an existing Bayesian longitudinal model that ignores missingness in the outcome. Furthermore, we demonstrate in a simulation study that, when a diffuse conservative prior that describes a range of assumptions about the bias effect is used, BSA credible intervals have greater length and higher coverage rate of the target parameters than existing methods, and that sensitivity increases as the percentage of missingness increases.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Lawrence McCandless
Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Delta Hedging for Single Premium Segregated Fund

Author: 
Date created: 
2017-03-31
Abstract: 

Segregated funds are individual insurance contracts that offer growth potential of investment in underlying assets while providing a guarantee to protect part of the money invested. The guarantee can cause significant losses to the insurer which makes it essential for the insurer to hedge this risk. In this project, we discuss the hedging effectiveness of delta hedging by studying the distribution of hedging errors under different assumptions about the return on underlying assets. We consider a Geometric Brownian motion and a Regime Switching Lognormal to model equity returns and compare the hedging effectiveness when risk-free rates are constant or stochastic. Two one-factor short-rate models, the Vasicek and CIR models, are used to model the risk-free rate. We find that delta hedging is in general effective but large hedging errors can occur when the assumptions of the Black-Scholes' framework are violated.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Gary Parker
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.