Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Recurrent event models: an application to offenders found not criminally responsible on account of mental disorder and their interactions with the health care and criminal justice systems

Author: 
Date created: 
2018-08-21
Abstract: 

Prior to committing an offence for which they are ultimately found not criminally responsible (NCR), offenders may have contact with the health care and criminal justice systems. Understanding the frequency of these contacts can potentially help to prevent such offences by informing strategies for intervention. In particular, escalation in contact frequency could foreshadow the committing of an index offence. Inspired by real data, in this project, we investigate models that describe such escalation. In particular, we consider two classes of models: time-to-event models that are framed in terms of numbers of contacts in an interval, and time-between-events models that are framed in terms of times between two successive contacts. Both classes of models can incorporate predictor variables and between-subject heterogeneity (via random effects). The properties of the maximum likelihood estimators of the escalation rate and the performance of the Kolmogorov-Smirnov test of goodness-of-fit are assessed using simulations under various scenarios.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

DB versus DC: a comparison of total compensation

Author: 
Date created: 
2018-07-06
Abstract: 

Employer-sponsored pension plans play an important role in providing employees with adequate retirement income. They are expensive and carry some important risks. The employer and its employees share these costs and risks differently depending on the plan design. In this project, two designs are studied, a defined benefit (DB) plan and a defined contribution (DC) plan. They are analyzed in a simple common business setup under the same stochastic economic scenarios generated from a calibrated VAR model. The employer’s total compensation budget is assumed to be constrained so that higher pension contributions are associated with lower salary increases, and vice versa. The two types of plans are compared based on the total compensation, defined as the value of wages and retirement income, received by 25 cohorts of new employees. On an adjusted basis, we find that the two types of plans provide equivalent total compensation to their members.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Gary Parker
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Net best-ball team composition in golf

Author: 
Date created: 
2018-08-07
Abstract: 

This project proposes a simple method of forming two-player and four-player golf teams for the purposes of net best-ball tournaments in stroke play format. The proposal is based on the recognition that variability is an important consideration in team composition; highly variable players contribute greatly in a best-ball setting. A theoretical derivation is provided for the proposed team formation. In addition, simulation studies are carried out which compare the proposal against other methods of team formation. In these studies, the proposed team composition leads to competitions that are more fair.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

New methods and models in functional data analysis

Author: 
Date created: 
2018-07-23
Abstract: 

Functional data analysis (FDA) plays an important role in analyzing function-valued data such as growth curves, medical images and electromagnetic spectrum profiles, etc. Since dimension reduction can be achieved for infinite-dimensional functional data via functional principal component analysis (FPCA), this technique has attracted substantial attention. We develop an easy-to-implement algorithm to perform FPCA and find that this algorithm compares favorably with traditional methods in numerous applications. Knowing how ran- dom functions interact is critical to studying mechanisms like gene regulations and event- related brain activation. A new approach is proposed to calibrate dynamical correlations of random functions and we apply this approach to quantify functional connectivity from medical images. Scalar-on-function regression, which is widely used to characterize the re- lationship between a functional covariate and a scalar response, is an important ingredient of FDA. We propose several new scalar-on-function regression models and investigate their properties from both theoretical and practical perspectives.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Exploring spatio-temporal patterns in emergency department use for mental health reasons from children and adolescents in Alberta, Canada

Author: 
Date created: 
2018-07-26
Abstract: 

This project analyses mental health related emergency department visits from children and adolescents in Alberta, Canada to understand the spatio-temporal patterns and identify risk factors. The data are extracted for the period 2002-2011 from the provincial health administrative data systems of Alberta. A descriptive data analysis is presented and then generalized linear models are explored to model the spatio-temporal pattern of the emergency department visit counts. The seasonal effect is examined using seasonal factors, sine and cosine functions and cyclic cubic smoothing splines. The spatial and temporal correlation structures are modelled using autoregressive model of order 1 and conditionally autoregressive model random effects. Demographic risk factors and their association with the frequency of mental health related emergency department visits is examined. Estimates of the model parameters are obtained and model diagnostics are performed to assess the fit of the model. Age, gender and proxy for socio-economic status are found to be important risk factors. The proposed model can be used as a predictive model to help identify regions and groups at a higher risk for mental health related emergency department visits.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
X. Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

The use of submodels as a basis for efficient estimation of complex models

Author: 
Date created: 
2017-11-08
Abstract: 

In this thesis, we consider problems where the true underlying models are complex and obtaining the maximum likelihood estimator (MLE) of the true model is challenging or time-consuming. In our first paper, we investigate a general class of parameter-driven models for time series of counts. Depending on the distribution of the latent variables, these models can be highly complex. We consider a set of simple models within this class as a basis for estimating the regression coefficients in the more complex models. We also derive standard errors (SEs) for these new estimators. We conduct a comprehensive simulation study to evaluate the accuracy and efficiency of our estimators and their SEs. Our results show that, except in extreme cases, the maximizer of the Poisson generalized linear model (the simplest estimator in our context) is an efficient, consistent, and robust estimator with a well-behaved standard error. In our second paper, we work in the context of display advertising, where the goal is to estimate the probability of conversion (a pre-defined action such as making a purchase) after a user clicks on an ad. In addition to accuracy, in this context, the speed with which the estimate can be computed is critical. Again, computing the MLEs of the true model for the observed conversion statuses (which depends on the distribution of the delays in observing conversions) is challenging, in this case because of the huge size of the data set. We use a logistic regression model as a basis for estimation, and then adjust this estimate for its bias. We show that our estimation algorithm leads to accurate estimators and requires far less computation time than does the MLE of the true model. Our third paper also concerns the conversion probability estimation problem in display advertising. We consider a more complicated setting where users may visit an ad multiple times prior to taking the desired action (e.g., making a purchase). We extend the estimator that we developed in our second paper to incorporate information from such visits. We show that this new estimator, the DV-estimator (which accounts for the distributions of both the conversion delay times and the inter-visit times) is more accurate and leads to better confidence intervals than the estimator that accounts only for delay times (the D-estimator). In addition, the time required to compute the DV-estimate for a given data set is only moderately greater than that required to compute the D-estimate -- and is substantially less than that required to compute the MLE. In summary, in a variety of settings, we show that estimators based on simple, misspecified models can lead us to accurate, precise, and computationally efficient estimates of both the key model parameters and their standard deviations.

Document type: 
Thesis
Supervisor(s): 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Multivariate CACE analysis with an application to Arthritis Health Journal Study

Author: 
Date created: 
2018-05-07
Abstract: 

Treatment noncompliance is a common issue in randomized controlled trials that may plague the randomization settings and bias the treatment effect estimation. The complier-average causal effect (CACE) model has become popular in estimating the method effectiveness under noncompliance. Performing multiple univariate CACE analysis separately fails to capture the potential correlations among multivariate outcomes, which will lead to biased estimates and significant loss of power in detecting actual treatment effect. Motivated by the Arthritis Health Journal Study, we propose a multivariate CACE model to better account for the correlations among outcomes. In our simulation study, the global likelihood ratio test is conducted to evaluate the treatment effect which fails to control the type I error for moderate sample sizes. So, we further perform a parametric bootstrap test to address this issue. Our simulation results suggest that the Multivariate CACE model outperforms multiple Univariate CACE models in the precision of estimation and statistical power in the case of correlated multivariate outcomes.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Hui Xie
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

A hierarchical credibility approach to modelling mortality rates for multiple populations

Author: 
Date created: 
2018-05-08
Abstract: 

A hierarchical credibility model is a generalization of the Bühlmann credibility model and the Bühlmann-Straub credibility model with a tree structure of four or more levels. This project aims to incorporate the hierarchical credibility theory, which is used in property and casualty insurance, to model the dependency of multi-population mortality rates. The forecasting performances of the three/four/five-level hierarchical credibility models are compared with those of the classical Lee-Carter model and its three extensions for multiple populations (joint-k, cointegrated and augmented common factor Lee-Carter models). Numerical illustrations based on mortality data for both genders of the US, the UK and Japan with a series of fitting year spans and three forecasting periods show that the hierarchical credibility approach contributes to more accurate forecasts measured by the AMAPE (average of mean absolute percentage errors). The proposed model is convenient to implement and can be further applied to projecting a mortality index for pricing mortality-indexed securities.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Cary Chi-Liang Tsai
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Understanding multicollinearity in Bayesian model averaging with BIC approximation

Author: 
Date created: 
2018-04-23
Abstract: 

Bayesian model averaging (BMA) is a widely used method for model and variable selection. In particular, BMA with Bayesian Information Criterion (BIC) approximation is a frequentist view of model averaging which saves a massive amount of computation compared to the fully Bayesian approach. However, BMA with BIC approximation may give misleading results in linear regression models when multicollinearity is present. In this article, we explore the relationship between performance of BMA with BIC approximation and the true regression parameters and correlations among explanatory variables. Specifically, we derive approximate formulae in the context of a known regression model to predict the BMA behaviours from 3 aspects - model selection, variable importance and coefficient estimation. We use simulations to verify the accuracy of the approximations. Through mathematical analysis, we demonstrate that BMA may not identify the correct model as the highest probability model if the coefficient and correlation parameters combine to minimize the residual sum of squares of a wrong model. We find that if the regression parameters of important variables are relatively large, BMA is generally successful in model and variable selection. On the other hand, if the regression parameters of important variables are relatively small, BMA can be dangerous in predicting the best model or important variables, especially when the full model correlation matrix is close to singular. The simulation studies suggest that our formulae are over-optimistic in predicting posterior probabilities of the true models and important variables. However, these formulae still provide us insights about the effect of collinearity on BMA.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Thomas M. Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Decomposing the RV coefficient to identify genetic markers associated with changes in brain structure

Author: 
Date created: 
2018-04-13
Abstract: 

Alzheimer’s disease (AD) is a chronic neurodegenerative disease that causes memory loss and decline in cognitive abilities; it is the sixth leading cause of death in the United States, affecting an estimated 5 million Americans and 747,000 Canadians. A recent study of AD pathogenesis (Szefer et al., 2017) used the RV coefficient to measure linear association between multiple genetic variants and multiple measurements of structural changes in the brain, using data from Alzheimer’s Disease Neuroimaging Initiative (ANDI). The authors decomposed the RV coefficient into contributions from individual variants and displayed these contributions graphically. In this project, we investigate the properties of such a “contribution plot” in terms of an underlying linear model, and discuss estimation of the components of the plot when the correlation signal may be sparse. The contribution plot is applied to genomic and brain imaging data from the ADNI-1 study, and to data simulated under various scenarios.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Brad McNeney
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.