Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

A bivariate longitudinal model for psychometric data

Author: 
Date created: 
2020-04-30
Abstract: 

Psychometric test data are useful for predicting a variety of important life outcomes and personality characteristics. The Cognitive Reflection Test (CRT) is a short, well-validated rationality test, designed to assess subjects' ability to override intuitively appealing but incorrect responses to a series of math- and logic-based questions. The CRT is predictive of many other cognitive abilities and tendencies, such as verbal intelligence, numeracy, and religiosity. Cognitive psychologists and psychometricians are concerned with whether subjects improve their scores on the test with repeated exposure, as this may threaten the test's predictive validity. This project uses the first publicly available longitudinal dataset derived from subjects who took the CRT multiple times over a predefined period. The dataset includes a multitude of predictors, including number of previous exposures to the test (our variable of primary interest). Also included are two response variables measured with each test exposure: CRT score and time taken to complete the CRT. These responses serve as a proxy for underlying latent variables, "rationality" and "reflectiveness", respectively. We propose methods to describe the relationship between the responses and selected predictors. Specifically, we employ a bivariate longitudinal model to account for the presumed dependence between our two responses. Our model also allows for subpopulations ("clusters") of individuals whose responses exhibit similar patterns. We estimate the parameters of our one- and two-cluster models via adaptive Gaussian quadrature. We also develop an Expectation-Maximization algorithm for estimating models with greater numbers of clusters. We use our fitted models to address a range of subject-specific questions in a formal way (building on earlier work relying on ad hoc methods). In particular, we find that test exposure has a greater estimated effect on test scores than previously reported and we find evidence of at least two subpopulations. Additionally, our work has generated numerous avenues for future investigation.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Statistical analysis of data from opioid use disorder study

Author: 
Date created: 
2020-04-24
Abstract: 

This project presents statistical analyses of data from a population based opioid use disorder research program. The primary interest is in estimating the association of a range of demographic, clinical and provider-related characteristics on retention in treatment for opioid use disorders. This focus was motivated by the province’s efforts to respond to the opioid overdose crisis, and the methodological challenges inherent in analyzing the recurrent nature of opioid use disorder and the treatment episodes. We start with executing a network analysis to clarify the influence of provider-related characteristics, including individual-, case-mix and prescriber network-related characteristics on treatment retention. We observe that the network characteristics have a statistically significant impact on OAT retention. Then we use a Cox proportional hazards model with a gamma frailty, while also considering how the ending of the previous episode will impact the future ones to start our investigation into the importance of the episode endings. Moreover, we consider three different analyses under multiple scenarios to reach our final goal of analyzing the multi-type events. The OAT episode counts of the study subjects throughout the follow-ups are analyzed using Poisson regression models. Logistic regression analyses of the records of the OAT episode types are conducted with mixed effects. Lastly, we analyze the OAT episode duration times marginally via an estimating function approach. The robust variance estimator is identified for the estimator of the model parameters. In addition, we conduct a simulation study to verify the findings of the data analysis. The outcomes of the analyses indicate that the OAT episode counts and duration times are significantly associated with a few covariates, such as gender and birth era, and the relationships are varying according to the OAT episode types.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
X. Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Incorporating statistical clustering methods into mortality models to improve forecasting performances

Author: 
Date created: 
2020-04-09
Abstract: 

Statistical clustering is a procedure of classifying a set of objects such that objects in the same class (called cluster) are more homogeneous, with respect to some features or characteristics, to each other than to those in other classes. In this project, we apply four clustering approaches to improving forecasting performances of the Lee-Carter and CBD models. First, each of four clustering methods (the Ward's hierarchical clustering, the divisive hierarchical clustering, the K-means clustering, and the Gaussian mixture model clustering) are adopted to determine, based on some characteristics of mortality rates, the number and members of age subgroups from a whole group of ages 25-84. Next, we forecast 10-year and 20-year mortality rates for each of the age subgroups using the Lee-Carter and CBD models, respectively. Finally, numerical illustrations are given with R packages "NbClust" and "mclust" for clustering. Mortality data for both genders of the US and the UK are obtained from the Human Mortality Database, and the MAPE (mean absolute percentage error) measure is adopted to evaluate forecasting performance. Comparisons of MAPE values are made with and without clustering, which demonstrate that all the proposed clustering methods can improve forecasting performances of the Lee-Carter and CBD models.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Cary Chi-Liang Tsai
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Functional neural networks for scalar prediction

Author: 
Date created: 
2020-04-07
Abstract: 

We introduce a methodology for integrating functional data into densely connected feed-forward neural networks. The model is defined for scalar responses with at least one functional covariate and some number of scalar covariates. A by-product of the method is a set of functional parameters that are dynamic to the learning process which leads to interpretability. The model is shown to perform well in a number of contexts including prediction of new data and recovery of the true underlying coefficient function; these results were confirmed through cross-validations and simulation studies. A collection of useful functions are built on top of the Keras/Tensorflow architecture allowing for general use of the approach.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Assessing the performance of an open spatial capture-recapture method on grizzly bear populations when age data is missing

Author: 
Date created: 
2020-02-13
Abstract: 

It is often difficult in capture-recapture (CR) studies of grizzly bear populations to determine the age of detected bears. As a result, analyses often omit age terms in CR models despite past studies suggesting age influences detection probability. This paper explores how failing to account for age in the detection function of an open, spatially-explicit CR model, introduced in Efford & Schofield (2019), affects estimates of apparent survival, apparent recruitment, population growth, and grizzly bear home-range sizes. Using a simulation study, it was found that estimates of all parameters of interest excluding home-range size were robust to this omission. The effects of using two different types of detectors for data collection (bait sites and rub objects) on bias in estimates of above parameters was also explored via simulation. No evidence was found that one detector type was more prone to producing biased parameter estimates than the other.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Steven Thompson
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Optimal investment and consumption strategy for a retiree under stochastic force of mortality

Author: 
Date created: 
2020-01-15
Abstract: 

With an increase in the self-driven retirement plans during past few decades, more and more retirees are managing their retirement portfolio on their own. Therefore, they need to know the optimal amount of consumption they can afford each year, and the optimal proportion of wealth they should invest in the financial market. In this project, we study the optimization strategy proposed by Delong and Chen (2016). Their model determines the optimal consumption and investment strategy for a retiree facing (1) a minimum lifetime consumption, (2) a stochastic force of mortality following a geometric Brownian motion process, (3) an annuity income, and (4) non-exponential discounting of future income. We use a modified version of the Cox, Ingersoll, and Ross (1985) model to capture the stochastic mortality intensity of the retiree and, subsequently, determine a new optimal consumption and investment strategy using their framework. We use an expansion method to solve the classic Hamilton-Jacobi-Bellman equation by perturbing the non-exponential discounting parameter using partial differential equations.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Jean-François Bégin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Foul Accumulation in the NBA

Author: 
Date created: 
2020-01-13
Abstract: 

This project investigates the fouling time distribution of players in the National Basketball Association. A Bayesian analysis is presented based on the assumption that fouling times follow a Gamma distribution. Various insights are obtained including the observation that players accumulate their nth foul more quickly for increasing n. Methods are developed that will allow coaches to better manage playing time in the presence of fouls such that key players are available in the latter stages of matches.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Prediction for Canadian federal election aided by Canadian Community Health Survey

Author: 
Date created: 
2019-09-05
Abstract: 

This project aims to develop predictive models for Canadian federal elections. We begin with explanatory analyses of two sets of data: some publicly accessible election data and some extracted data from the Canadian Community Health Survey (CCHS) 2007-2018 on life satisfaction and other potentially associated social-demographics. We propose to predict for federal election outcomes using the information on longitudinal Canadian life satisfaction. Specifically, we model the federal election outcome for a riding in change from its previous election jointly with its longitudinal life satisfaction since the previous election. Election data from years 2008 and 2011 and the CCHS data of 2008-2011 are employed to fit the model via both the two-stage estimation and the maximum likelihood estimation by the Monte Carlo EM algorithm. The analysis results indicate that life satisfaction is an important factor in election prediction. It appears that young adults are more likely to vote for a change but male voters are less likely to do so. Using voter information or CCHS respondent's information to model the election outcomes produce different estimation results. Two applications of the proposed approach are presented to further illustrate the proposed joint modeling approach.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
X. Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An analysis of loan prepayment using competing risks random forests

Author: 
Date created: 
2019-11-27
Abstract: 

Loan prepayment is a large cause of loss to financial institutions when they issue installment loans, and has not been well studied with respect to predicting it for individual borrowers. Using a dataset of competing risks times for loan termination, competing risks random forests were used as a non-parametric approach for identifying useful predictors, and for finding a tuned model that demonstrated that loan prepayment can be predicted on an individual borrower basis. In addition, a new software package we developed, largeRCRF, is introduced and evaluated for the purpose of training competing risks random forests on large scale datasets. This research is a firm first step for financial institutions to reduce their prepayment rates and increase their margins.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Ain’t played nobody: Building an optimal schedule to secure an NCAA tournament berth

Author: 
Date created: 
2019-08-12
Abstract: 

American men’s college basketball teams compete annually for the National Collegiate Athletic Association (NCAA) national championship, determined through a 68-team, single-elimination tournament known as “March Madness”. Tournament participants either qualify automatically, through their conferences’ year-end tournaments, or are chosen by a selection committee based on various measures of regular season success. When selecting teams, the committee reportedly values a team's quality of, and performance against, opponents outside of their conference. Since teams have some freedom in selecting nonconference games, we seek to develop an approach to optimizing this choice. Using historical data, we find the committee's most valued criteria for selecting tournament teams. Additionally, we use prior seasons’ success and projected returning players to forecast every team’s strength for the upcoming season. Using the selection criteria and these projections, we develop a tool to help teams build the optimal nonconference schedule to increase their NCAA tournament selection probability.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Thomas Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.