Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Using AI and Statistical Techniques to Correct Play-by-play Substitution Errors

Author: 
Date created: 
2017-05-26
Abstract: 

Play-by-play is an important data source for basketball analysis, particularly for leagues that cannot afford the infrastructure for collecting video tracking data; it enables advanced metrics like adjusted plus-minus and lineup analysis like With Or Without You (WOWY). However, this analysis is not possible unless all substitutions are recorded and are correct. In this paper we use six seasons of play-by-play from the Canadian university league to derive a framework for automated cleaning of play-by-play that is littered with substitution logging errors. These errors include missing substitutions, unequal number of players subbing in and out, substitution patterns of a player not alternating between in/out, and more. We define features to build a prediction model for identifying correct/incorrect recorded substitutions and outline a simple heuristic for player activity to use for inferring the players who were not accounted for in the substitutions. We define two performance measures for objectively quantifying the effectiveness of this framework. The play-by-play which results from the algorithm opens up a set of statistics that were not obtainable for the Canadian university league which improves their analytics capabilities; coaches can improve strategy leading to a more competitive product, and media can introduce modern statistics in their coverage to increase engagement from fans.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An applied analysis of high-dimensional logistic regression

Author: 
Date created: 
2017-05-16
Abstract: 

In the high dimensional setting, we investigate common regularization approaches for fitting logistic regression models with binary response variables. A literature review is provided on generalized linear models, regularization approaches which include the lasso, ridge, elastic net and relaxed lasso, and recent post-selection methods for obtaining p-values of coefficient estimates proposed by Lockhart et. al. and Buhlmann et. al. We consider varying n, p conditions, and assess model performance based on several evaluation metrics - such as their sparsity, accuracy and algorithmic time efficiency. Through a simulation study, we find that Buhlmann et. al’s multi sample splitting method performed poorly when selected covariates were highly correlated. When λ was chosen through cross validation, the elastic net had similar levels of performance as compared to the lasso, but it did not possess the level of sparsity Zou and Hastie have suggested.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Richard Lockhart
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Bayesian Sensitivity Analysis for Non-ignorable Missing Data in Longitudinal Studies

Author: 
Date created: 
2017-04-13
Abstract: 

The use of Bayesian statistical methods to handle missing data in biomedical studies has become popular in recent years. In this thesis, we propose a novel Bayesian sensitivity analysis (BSA) model that accounts for the influences of missing outcome data on the estimation of treatment effects in randomized control trials with non-ignorable missing data. We implement the method using the probabilistic programming language Stan, and apply it to data from the Vancouver At Home (VAH) Study, which is a randomized control trial that provided housing to homeless people with mental illness. We compare the results of BSA to those from an existing Bayesian longitudinal model that ignores missingness in the outcome. Furthermore, we demonstrate in a simulation study that, when a diffuse conservative prior that describes a range of assumptions about the bias effect is used, BSA credible intervals have greater length and higher coverage rate of the target parameters than existing methods, and that sensitivity increases as the percentage of missingness increases.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Lawrence McCandless
Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Delta Hedging for Single Premium Segregated Fund

Author: 
Date created: 
2017-03-31
Abstract: 

Segregated funds are individual insurance contracts that offer growth potential of investment in underlying assets while providing a guarantee to protect part of the money invested. The guarantee can cause significant losses to the insurer which makes it essential for the insurer to hedge this risk. In this project, we discuss the hedging effectiveness of delta hedging by studying the distribution of hedging errors under different assumptions about the return on underlying assets. We consider a Geometric Brownian motion and a Regime Switching Lognormal to model equity returns and compare the hedging effectiveness when risk-free rates are constant or stochastic. Two one-factor short-rate models, the Vasicek and CIR models, are used to model the risk-free rate. We find that delta hedging is in general effective but large hedging errors can occur when the assumptions of the Black-Scholes' framework are violated.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Gary Parker
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Analysis of Target Benefit Plans with Aggregate Cost Method

Author: 
Date created: 
2017-04-06
Abstract: 

The operational characteristics of a target benefit plan based on an aggregate pension cost method are studied through simulation under a multivariate time series model for projected interest rates and equity returns. The performance of the target benefit plan is evaluated by applying a variety of performance metrics for benefit security, benefit adequacy, benefit stability and intergenerational equity. Performance is shown to improve when the economy remains relatively stable over time and when the choice of valuation rate does not create persistent gains or losses.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Gary Parker
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Predictive Estimation in Canadian Federal Elections

Date created: 
2017-04-20
Abstract: 

Various estimation methods are employed to provide seat projections during Canadian federal elections. This project explores discrepancies between the real outcomes of recent Canadian federal elections and the predictions by the existing approaches such as the ones proposed by Grenier and Rosenthal. It appears that each seat projection procedure requires a set of assumptions, but the assumptions are not explicitly listed in the accessible references. We formulate the required assumptions used in the two prediction procedures proposed by Rosenthal, and present variance estimation procedures. Departures from the assumptions are explored with real data from the 2006, 2008, 2011, and 2015 federal election. An extensive simulation study is conducted to examine potential impacts of various deviations from the assumptions. The simulation indicates that, compared to other assumption violations, misleading polling results may cause the most damage to the prediction. In addition, we find by the simulation that the prediction is least affected by a change in number of voters and the heterogeneity of riding patterns within a region may not affect the the prediction at the national level.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Stochastic Modelling and Comparison of Two Pension Plans

Author: 
Date created: 
2017-04-19
Abstract: 

In this project, we simulate the operation of a stylized jointly sponsored pension plan (JSPP) and a stylized defined contribution (DC) plan with identical contribution patterns using a vector autoregressive model for key economic variables. The performance of the two plans is evaluated by comparing the distribution of pension ratios for a specific cohort of new entrants. We find that the DC plan outperforms the JSPP in terms of expected pension ratio, and experiences only a moderate degree of downside risk. This downside risk is not enough to outweigh the upside potential even for a relatively risk-averse member, as reflected in the expected discounted utility of benefits under the two plans. Under more sophisticated rate stabilization techniques, the probability that the DC plan outperforms the JSPP increases. When the bond yield and stock return processes begin from values far above their long-term means (not far below, as is the case today), the DC plan is projected to outperform the JSPP even more frequently, because the higher required contributions accrue to the advantage of the individual member only, instead of also financing benefits for others.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Barbara Sanders
Gary Parker
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Analysis of the Bitcoin Exchange using Particle MCMC Methods

Author: 
Date created: 
2017-03-24
Abstract: 

Stochastic volatility models (SVM) are commonly used to model time series data. They have many applications in finance and are useful tools to describe the evolution of asset returns. The motivation for this project is to determine if stochastic volatility models can be used to model Bitcoin exchange rates in a way that can contribute to an effective trading strategy. We consider a basic SVM and several extensions that include fat tails, leverage, and covariate effects. The Bayesian approach with the particle Markov chain Monte Carlo (PMCMC) method is employed to estimate the model parameters. We assess the goodness of the estimated model using the deviance information criterion (DIC). Simulation studies are conducted to assess the performance of particle MCMC and to compare with the traditional MCMC approach. We then apply the proposed method to the Bitcoin exchange rate data and compare the effectiveness of each type of SVM.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Bayesian methods for multi-modal posterior topologies

Date created: 
2017-04-18
Abstract: 

The purpose of this thesis is to develop efficient Bayesian methods to address multi-modality in posterior topologies. In Chapter 2 we develop a new general Bayesian methodology that simultaneously estimates parameters of interest and probability of the model. The proposed methodology builds on the Simulated Tempering algorithm, which is a powerful sampling algorithm that handles multi-modal distributions, but it is difficult to use in practice due to the requirement to choose suitable prior for the temperature and temperature schedule. Our proposed algorithm removes this requirement, while preserving the sampling efficiency of the Simulated Tempering algorithm. We illustrate the applicability of the new algorithm to different examples involving mixture models of Gaussian distributions and ordinary differential equation models. Chapter 3 proposes a general optimization strategy, which combines results from different optimization or parameter estimation methods to overcome shortcomings of a single method. Embedding the proposed optimization strategy in the Incremental Mixture Importance Sampling with Optimization algorithm (IMIS-Opt) significantly improves sampling efficiency and removes the dependence on the choice of the prior of the IMIS-Opt. We demonstrate that the resulting algorithm provides accurate parameter estimates, while the IMIS-Opt gets trapped in a local mode in the case of the ordinary differential equation (ODE) models. Finally, the resulting algorithm is implemented within the Approximate Bayesian Computation framework to draw likelihood-free inference. Chapter 4 introduces a generalization of the Bayesian Information Criterion (BIC) that handles multi-modality in the posterior space. The BIC is a computationally efficient model selection tool, but it relies on the assumption that the posterior distribution is unimodal. When the posterior is multi-modal the BIC uses only one posterior mode, while discarding the information from the rest of the modes. We demonstrate that the BIC produces inaccurate estimates of the posterior probability of the bimodal model, which in some cases results in the BIC selecting the sub-optimal model. As a remedy, we propose a Multi-modal BIC (MBIC) that incorporates all relevant posterior modes, while preserving the computational efficiency of the BIC. The accuracy of the MBIC is demonstrated through bimodal models and mixture models of Gaussian distributions.

Document type: 
Thesis
Supervisor(s): 
Dr. David Campbell
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Marginal Loglinear Models for Three Multiple-Response Categorical Variables

Author: 
Date created: 
2016-12-09
Abstract: 

A lot of survey questions include a phrase like, “Choose all that apply”, which lets the respondents choose any number of options from predefined lists of items. Responses to thesequestions result in multiple-response categorical variables (MRCVs). This thesis focuses on analyzing and modeling three MRCVs. There are 232 possible models representing different combinations of associations. Parameters are estimated using generalized estimating equations generated by a pseudo-likelihood and variances of the estimators are corrected using sandwich methods. Due to the large number of possible models, model comparisons based on nested models would be inappropriate. As an alternative, model averaging is proposed as a model comparison tool as well as to account for model selection uncertainty. Further the calculations required for computing the variance of the estimators can exceed 32-bit machine capacity even for a moderately large number of items. This issue is addressed by reducing dimensions of the matrices.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Thomas Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.