Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Functional neural networks for scalar prediction

Author: 
Date created: 
2020-04-07
Abstract: 

We introduce a methodology for integrating functional data into densely connected feed-forward neural networks. The model is defined for scalar responses with at least one functional covariate and some number of scalar covariates. A by-product of the method is a set of functional parameters that are dynamic to the learning process which leads to interpretability. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying coefficient function; these results were confirmed through cross-validations and simulation studies. A collection of useful functions are built on top of the Keras/Tensorflow architecture allowing for general use of the approach.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Assessing the performance of an open spatial capture-recapture method on grizzly bear populations when age data is missing

Author: 
Date created: 
2020-02-13
Abstract: 

It is often difficult in capture-recapture (CR) studies of grizzly bear populations to determine the age of detected bears. As a result, analyses often omit age terms in CR models despite past studies suggesting age influences detection probability. This paper explores how failing to account for age in the detection function of an open, spatially-explicit CR model, introduced in Efford & Schofield (2019), affects estimates of apparent survival, apparent recruitment, population growth, and grizzly bear home-range sizes. Using a simulation study, it was found that estimates of all parameters of interest excluding home-range size were robust to this omission. The effects of using two different types of detectors for data collection (bait sites and rub objects) on bias in estimates of above parameters was also explored via simulation. No evidence was found that one detector type was more prone to producing biased parameter estimates than the other.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Steven Thompson
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Optimal investment and consumption strategy for a retiree under stochastic force of mortality

Author: 
Date created: 
2020-01-15
Abstract: 

With an increase in the self-driven retirement plans during past few decades, more and more retirees are managing their retirement portfolio on their own. Therefore, they need to know the optimal amount of consumption they can afford each year, and the optimal proportion of wealth they should invest in the financial market. In this project, we study the optimization strategy proposed by Delong and Chen (2016). Their model determines the optimal consumption and investment strategy for a retiree facing (1) a minimum lifetime consumption, (2) a stochastic force of mortality following a geometric Brownian motion process, (3) an annuity income, and (4) non-exponential discounting of future income. We use a modified version of the Cox, Ingersoll, and Ross (1985) model to capture the stochastic mortality intensity of the retiree and, subsequently, determine a new optimal consumption and investment strategy using their framework. We use an expansion method to solve the classic Hamilton-Jacobi-Bellman equation by perturbing the non-exponential discounting parameter using partial differential equations.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jean-François Bégin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Foul Accumulation in the NBA

Author: 
Date created: 
2020-01-13
Abstract: 

This project investigates the fouling time distribution of players in the National Basketball Association. A Bayesian analysis is presented based on the assumption that fouling times follow a Gamma distribution. Various insights are obtained including the observation that players accumulate their nth foul more quickly for increasing n. Methods are developed that will allow coaches to better manage playing time in the presence of fouls such that key players are available in the latter stages of matches.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Prediction for Canadian federal election aided by Canadian Community Health Survey

Author: 
Date created: 
2019-09-05
Abstract: 

This project aims to develop predictive models for Canadian federal elections. We begin with explanatory analyses of two sets of data: some publicly accessible election data and some extracted data from the Canadian Community Health Survey (CCHS) 2007-2018 on life satisfaction and other potentially associated social-demographics. We propose to predict for federal election outcomes using the information on longitudinal Canadian life satisfaction. Specifically, we model the federal election outcome for a riding in change from its previous election jointly with its longitudinal life satisfaction since the previous election. Election data from years 2008 and 2011 and the CCHS data of 2008-2011 are employed to fit the model via both the two-stage estimation and the maximum likelihood estimation by the Monte Carlo EM algorithm. The analysis results indicate that life satisfaction is an important factor in election prediction. It appears that young adults are more likely to vote for a change but male voters are less likely to do so. Using voter information or CCHS respondent's information to model the election outcomes produce different estimation results. Two applications of the proposed approach are presented to further illustrate the proposed joint modeling approach.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
X. Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An analysis of loan prepayment using competing risks random forests

Author: 
Date created: 
2019-11-27
Abstract: 

Loan prepayment is a large cause of loss to financial institutions when they issue installment loans, and has not been well studied with respect to predicting it for individual borrowers. Using a dataset of competing risks times for loan termination, competing risks random forests were used as a non-parametric approach for identifying useful predictors, and for finding a tuned model that demonstrated that loan prepayment can be predicted on an individual borrower basis. In addition, a new software package we developed, largeRCRF, is introduced and evaluated for the purpose of training competing risks random forests on large scale datasets. This research is a firm first step for financial institutions to reduce their prepayment rates and increase their margins.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Ain’t played nobody: Building an optimal schedule to secure an NCAA tournament berth

Author: 
Date created: 
2019-08-12
Abstract: 

American men’s college basketball teams compete annually for the National Collegiate Athletic Association (NCAA) national championship, determined through a 68-team, single-elimination tournament known as “March Madness”. Tournament participants either qualify automatically, through their conferences’ year-end tournaments, or are chosen by a selection committee based on various measures of regular season success. When selecting teams, the committee reportedly values a team's quality of, and performance against, opponents outside of their conference. Since teams have some freedom in selecting nonconference games, we seek to develop an approach to optimizing this choice. Using historical data, we find the committee's most valued criteria for selecting tournament teams. Additionally, we use prior seasons’ success and projected returning players to forecast every team’s strength for the upcoming season. Using the selection criteria and these projections, we develop a tool to help teams build the optimal nonconference schedule to increase their NCAA tournament selection probability.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Thomas Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Advanced Monte Carlo methods and applications

Author: 
Date created: 
2019-08-16
Abstract: 

Monte Carlo methods have emerged as standard tools to do Bayesian statistical inference for sophisticated models. Sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) are two main classes of methods to sample from high dimensional probability distributions. This thesis develops methodologies within these classes to address problems in different research areas. Phylogenetic tree reconstruction is a main task in evolutionary biology. Traditional MCMC methods may suffer from the curse of dimensionality and the local-trap problem. Firstly, we introduce a new combinatorial SMC method, with a novel and efficient proposal distribution. We also explore combining SMC and Gibbs sampling to jointly estimate the phylogenetic trees and evolutionary parameter of genetic data sets. Secondly, we propose an ``embarrassingly parallel'' method for Bayesian phylogenetic inference, annealed SMC, based on recent advances in the SMC literature such as adaptive determination of annealing parameters. Another application of the methods presented in this thesis is in genome wide-association studies. Linear mixed models (LMMs) are powerful methods for controlling confounding caused by population structure. We develop a Bayesian hierarchical model to jointly estimate LMM parameters and the genetic similarity matrix using genetic sequences and phenotypes. We develop an SMC method to jointly approximate the posterior distributions of the LMM and phylogenetic trees. We also consider parameter estimation for nonlinear differential equation (DE) systems from noisy measurements of dynamic systems. We develop a fully Bayesian framework for non-linear DE systems. A flexible nonparametric function is used to represent the dynamic process such that expensive numerical solvers can be avoided. We derive an SMC method to sample from multi-modal DE posterior distributions. In addition, we consider Bayesian computing problems related to importance sampling and misclassification in multinomial data. Lastly, motivated by a personalized recommender system with dynamic preference changes, we develop a new hidden Markov model (HMM) and propose an efficient online SMC algorithm by hybridizing with the EM algorithm for the HMM model.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Shrinkage parameter estimation for penalized logistic regression analysis of case-control data

Author: 
Date created: 
2019-08-15
Abstract: 

In genetic epidemiology, rare variant case-control studies aim to investigate the association between rare genetic variants and human diseases. Rare genetic variants lead to sparse covariates that are predominately zeros and this sparseness leads to estimators of log-odds-ratio parameters that are biased away from their null value of zero. Different penalized-likelihood methods have been developed to mitigate this sparse-data bias for case-control studies. In this project, we study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. We propose a simple method to select the value of m based on a marginal likelihood. The marginal likelihood is maximized by the Monte Carlo EM algorithm. Properties of the proposed method are evaluated in a simulation study, and the method is applied to a real dataset from the ADNI-1 study.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Brad McNeney
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

A multi-state model for pricing critical illness insurance products

Author: 
Date created: 
2019-08-21
Abstract: 

Due to increasing cases of cancer and other severe illnesses, there is a great demand of critical illness insurance products. This project introduces a Markovian multi-state model based on popular critical illness plans to describe the policyholder's health condition over time, which includes being diagnosed with certain dread diseases such as cancer, stroke and heart attack. Critical illness insurance products with life insurance or other optional riders are considered. Following the idea of Baione and Levantesi (Insurance: Mathematics and Economics, 58: 174-184, 2014), we focus on the method of modelling mortality rates, estimating transition probabilities with Canadian prevalence rates and incidence rates of covered illnesses, and calculating premium rates based on the multi-state model. A comparison of transition intensities under various mortality models and premium rates for critical illness policies under several graduation approaches are also illustrated.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Yi Lu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.