Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Statistical inference using large administrative data on multiple event times, with application to cancer survivorship research

Author: 
Date created: 
2018-12-20
Abstract: 

Motivated by the breast cancer survivorship research program at BC Cancer Agency, this dissertation develops statistical approaches to analyzing right-censored multivariate event time data. Following the background and motivation of the research, we introduce the framework of the dissertation, and provide a literature review and a list of the research questions. A description of the motivating study data is then given together with a preliminary analysis before presenting the proposed approaches as follows. We consider firstly estimation of the joint survivor function of multiple event times when the observations are subject to informative censoring due to a terminating event. We formulate the potential dependence of the multiple event times with the time to the terminating event by the Archimedean copulas. This may account for the informative censoring and, at the same time, allow to adapt the commonly used two-step procedure for estimating the joint distribution of the multiple event times under a copula model. We propose an easy-to-implement pseudo-likelihood based estimation procedure under the model, which reduces computational intensity compared to its MLE counterpart. A more flexible approach is then proposed to handling informative censoring with particular attention to observations on bivariate event time potentially censored by a terminating event. We formulate the correlation of the bivariate event time with the censoring time by embedding the bivariate event time distribution in a bivariate copula model. This yields the convenience of inference under the conventional copula model. At the same time, the proposed model is more flexible, and thus potentially more appropriate in many practical situations than modeling the event times and the associated censoring time jointly by a single multivariate copula. Adapting the commonly used two-stage estimation procedure under a copula model, we develop an easy-to-implement estimator for the joint survivor function of the two event times. A by-product of the proposed approaches is an estimator for the marginal distribution of a single event time with semicompeting-risks data. Further, we extend the approach to regression settings to explore covariate effects in either parametric or nonparametric forms. In particular, adjusting for some covariates, we compare two populations based on an event time with observations subject to informative censoring. We conduct both asymptotic and simulation studies to examine the consistency, efficiency, and robustness of the proposed approaches. The breast cancer program that motivated this research is employed to illustrate the methodological development throughout the dissertation.

Document type: 
Thesis
File(s): 
Senior supervisor: 
X. Joan Hu
John J. Spinelli
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Masquerade detection: A topic model based approach

Date created: 
2018-12-19
Abstract: 

The goal of masquerade detection is to "detect" when an intruder has infiltrated a computer system by looking for evidence of malicious behaviour. In this project, I use a topic model based intrusion detection system to search for intruders within the SEA and Greenberg datasets of Unix computer commands. Using LDA topic modeling I was able to find a probability distribution for each user for both the topics over a block of commands and over each command. Using these two probability distributions and building on previous detection techniques I was able to create five different detection techniques. I describe how I created the five LDA based models and combine them to find a sixth model. All of these techniques performed on par with their non-LDA counter-parts. Therefore, combined with the reduction in dimensionality afforded by the LDA topic model, I conclude that my methods perform better than the current models.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Derek Bingham
David Campbell
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Partial stratification in capture-recapture experiments and integrated population modeling with radio telemetry

Date created: 
2018-12-19
Abstract: 

In this thesis, we develop and apply three new methods for ecological data sets. We present two new developments related to capture-recapture studies and one development related to integrated population modeling. In the first project, we present new methods using partial stratification in two-sample capture-recapture experiments for closed populations. Capture heterogeneity is known to cause bias in estimates of abundance in capture-recapture experiments. This heterogeneity is often related to observable fixed characteristics of the animals such as sex. If this information can be observed for each handled animal at both sample occasions, then it is straightforward to stratify (e.g. by sex) and obtain stratum-specific estimates. However in many fishery experiments it is difficult to sex all captured fish because morphological differences are slight or because of logistic constraints. In these cases, a sub-sample of the captured fish at each sample occasion is selected and additional and often more costly measurements are made, such as sex determination through sacrificing the fish. We develop new methods to estimate abundance for these types of experiments. Furthermore, we develop methods for optimal allocation of effort for a given cost. We also develop methods to account for additional information (e.g. prior information about the sex ratio) and for supplemental continuous covariates such as length. These methods are applied to a problem of estimating the size of the walleye population in Mille Lacs Lake Minnesota, USA. In the second project, we present new methods using partial stratification in k-sample (k>=2) capture-recapture experiments of a closed population with known losses on capture to estimate abundance. We present the new methods for large populations using maximum likelihood and a Bayesian method and simulated data with known losses on capture was used to illustrate the new methods. In the third project, we present an integrated population model using capture-recapture, dead recovery, snorkel, and radio telemetry surveys. We apply this model to Chinook salmon on the West Coast of Vancouver Island, Canada to estimate spawning escapement and to describe the movement from the ocean to spawning grounds considering the stopover time, stream residence time, and snorkel survey observer efficiency.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Carl Schwarz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Cooperation in target benefit plans: A game theoretical perspective

Author: 
Date created: 
2018-12-12
Abstract: 

Many occupational pension plans rely on intergenerational cooperation to deliver stable retirement benefits; however, this cooperation has natural limits and exceeding these limits can threaten the sustainability of the plan. In this project, we cast the problem of intergenerational cooperation within funded pension plans in a game theoretic framework that incorporates overlapping generations and uncertainty in the cost of cooperation. Employing the concept of a subgame perfect equilibrium, we determine the threshold above which cooperation should not be enforced. Using two different processes for the stochastic cost of cooperation, we illustrate the combination of parameters that allow for the existence of a reasonable threshold, and study how the level of prefunding and the stochastic process parameters affect both the threshold and the probability of sanctioned non-cooperation.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Recurrent event models: an application to offenders found not criminally responsible on account of mental disorder and their interactions with the health care and criminal justice systems

Author: 
Date created: 
2018-08-21
Abstract: 

Prior to committing an offence for which they are ultimately found not criminally responsible (NCR), offenders may have contact with the health care and criminal justice systems. Understanding the frequency of these contacts can potentially help to prevent such offences by informing strategies for intervention. In particular, escalation in contact frequency could foreshadow the committing of an index offence. Inspired by real data, in this project, we investigate models that describe such escalation. In particular, we consider two classes of models: time-to-event models that are framed in terms of numbers of contacts in an interval, and time-between-events models that are framed in terms of times between two successive contacts. Both classes of models can incorporate predictor variables and between-subject heterogeneity (via random effects). The properties of the maximum likelihood estimators of the escalation rate and the performance of the Kolmogorov-Smirnov test of goodness-of-fit are assessed using simulations under various scenarios.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

DB versus DC: a comparison of total compensation

Author: 
Date created: 
2018-07-06
Abstract: 

Employer-sponsored pension plans play an important role in providing employees with adequate retirement income. They are expensive and carry some important risks. The employer and its employees share these costs and risks differently depending on the plan design. In this project, two designs are studied, a defined benefit (DB) plan and a defined contribution (DC) plan. They are analyzed in a simple common business setup under the same stochastic economic scenarios generated from a calibrated VAR model. The employer’s total compensation budget is assumed to be constrained so that higher pension contributions are associated with lower salary increases, and vice versa. The two types of plans are compared based on the total compensation, defined as the value of wages and retirement income, received by 25 cohorts of new employees. On an adjusted basis, we find that the two types of plans provide equivalent total compensation to their members.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Gary Parker
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Net best-ball team composition in golf

Author: 
Date created: 
2018-08-07
Abstract: 

This project proposes a simple method of forming two-player and four-player golf teams for the purposes of net best-ball tournaments in stroke play format. The proposal is based on the recognition that variability is an important consideration in team composition; highly variable players contribute greatly in a best-ball setting. A theoretical derivation is provided for the proposed team formation. In addition, simulation studies are carried out which compare the proposal against other methods of team formation. In these studies, the proposed team composition leads to competitions that are more fair.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

New methods and models in functional data analysis

Author: 
Date created: 
2018-07-23
Abstract: 

Functional data analysis (FDA) plays an important role in analyzing function-valued data such as growth curves, medical images and electromagnetic spectrum profiles, etc. Since dimension reduction can be achieved for infinite-dimensional functional data via functional principal component analysis (FPCA), this technique has attracted substantial attention. We develop an easy-to-implement algorithm to perform FPCA and find that this algorithm compares favorably with traditional methods in numerous applications. Knowing how ran- dom functions interact is critical to studying mechanisms like gene regulations and event- related brain activation. A new approach is proposed to calibrate dynamical correlations of random functions and we apply this approach to quantify functional connectivity from medical images. Scalar-on-function regression, which is widely used to characterize the re- lationship between a functional covariate and a scalar response, is an important ingredient of FDA. We propose several new scalar-on-function regression models and investigate their properties from both theoretical and practical perspectives.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Exploring spatio-temporal patterns in emergency department use for mental health reasons from children and adolescents in Alberta, Canada

Author: 
Date created: 
2018-07-26
Abstract: 

This project analyses mental health related emergency department visits from children and adolescents in Alberta, Canada to understand the spatio-temporal patterns and identify risk factors. The data are extracted for the period 2002-2011 from the provincial health administrative data systems of Alberta. A descriptive data analysis is presented and then generalized linear models are explored to model the spatio-temporal pattern of the emergency department visit counts. The seasonal effect is examined using seasonal factors, sine and cosine functions and cyclic cubic smoothing splines. The spatial and temporal correlation structures are modelled using autoregressive model of order 1 and conditionally autoregressive model random effects. Demographic risk factors and their association with the frequency of mental health related emergency department visits is examined. Estimates of the model parameters are obtained and model diagnostics are performed to assess the fit of the model. Age, gender and proxy for socio-economic status are found to be important risk factors. The proposed model can be used as a predictive model to help identify regions and groups at a higher risk for mental health related emergency department visits.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
X. Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

The use of submodels as a basis for efficient estimation of complex models

Author: 
Date created: 
2017-11-08
Abstract: 

In this thesis, we consider problems where the true underlying models are complex and obtaining the maximum likelihood estimator (MLE) of the true model is challenging or time-consuming. In our first paper, we investigate a general class of parameter-driven models for time series of counts. Depending on the distribution of the latent variables, these models can be highly complex. We consider a set of simple models within this class as a basis for estimating the regression coefficients in the more complex models. We also derive standard errors (SEs) for these new estimators. We conduct a comprehensive simulation study to evaluate the accuracy and efficiency of our estimators and their SEs. Our results show that, except in extreme cases, the maximizer of the Poisson generalized linear model (the simplest estimator in our context) is an efficient, consistent, and robust estimator with a well-behaved standard error. In our second paper, we work in the context of display advertising, where the goal is to estimate the probability of conversion (a pre-defined action such as making a purchase) after a user clicks on an ad. In addition to accuracy, in this context, the speed with which the estimate can be computed is critical. Again, computing the MLEs of the true model for the observed conversion statuses (which depends on the distribution of the delays in observing conversions) is challenging, in this case because of the huge size of the data set. We use a logistic regression model as a basis for estimation, and then adjust this estimate for its bias. We show that our estimation algorithm leads to accurate estimators and requires far less computation time than does the MLE of the true model. Our third paper also concerns the conversion probability estimation problem in display advertising. We consider a more complicated setting where users may visit an ad multiple times prior to taking the desired action (e.g., making a purchase). We extend the estimator that we developed in our second paper to incorporate information from such visits. We show that this new estimator, the DV-estimator (which accounts for the distributions of both the conversion delay times and the inter-visit times) is more accurate and leads to better confidence intervals than the estimator that accounts only for delay times (the D-estimator). In addition, the time required to compute the DV-estimate for a given data set is only moderately greater than that required to compute the D-estimate -- and is substantially less than that required to compute the MLE. In summary, in a variety of settings, we show that estimators based on simple, misspecified models can lead us to accurate, precise, and computationally efficient estimates of both the key model parameters and their standard deviations.

Document type: 
Thesis
Senior supervisor: 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.