Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

A multi-state model for pricing critical illness insurance products

Author: 
Date created: 
2019-08-21
Abstract: 

Due to increasing cases of cancer and other severe illnesses, there is a great demand of critical illness insurance products. This project introduces a Markovian multi-state model based on popular critical illness plans to describe the policyholder's health condition over time, which includes being diagnosed with certain dread diseases such as cancer, stroke and heart attack. Critical illness insurance products with life insurance or other optional riders are considered. Following the idea of Baione and Levantesi (Insurance: Mathematics and Economics, 58: 174-184, 2014), we focus on the method of modelling mortality rates, estimating transition probabilities with Canadian prevalence rates and incidence rates of covered illnesses, and calculating premium rates based on the multi-state model. A comparison of transition intensities under various mortality models and premium rates for critical illness policies under several graduation approaches are also illustrated.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Yi Lu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

A moneyness-adapting fee structure for guaranteed benefits embedded in variable annuities: Pricing and valuation

Date created: 
2019-08-21
Abstract: 

Guaranteed minimum death benefit (GMDB) and guaranteed minimum maturity benefit (GMMB) are two common guarantee riders embedded in variable annuities. To cover the financial risks incurring from the guarantees, fees are charged based on the underlying fund value, where a traditional approach funds the guarantees as a constant rate of fee over the period of the accumulation phase. This fee structure, however, potentially encourages surrendering when the options are out-of-money. To prevent the adverse incentives, Bernard et al. (2014) introduced a state-dependent fee, where fees are charged only when the guarantees are in-the-money or close to being in-the-money. This project proposes a moneyness-adapting fee structure, aiming to further reduce the insurer’s reserve. Following the estimation of rate of fee charged for GMDB and/or GMMB under three pricing principles, the performances of three fee structures are compared with numerical illustrations, based on the measures of value-at-risk and conditional-tail-expectation.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Cary Chi-Liang Tsai
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Supersaturated designs for screening experiments and strong orthogonal arrays for computer experiments

Author: 
Date created: 
2019-08-23
Abstract: 

This dissertation centers on supersaturated designs and strong orthogonal arrays, which provide useful plans for screening experiments and computer experiments, respectively. Supersaturated designs are a good choice for screening experiments. In order to use such designs, a common assumption that all interactions are negligible is made. In this dissertation, this assumption is dropped for the use of supersaturated designs. We propose and study a new class of supersaturated designs, namely foldover supersaturated designs, which allow the active main effects to be identified without making the assumption that two-factor interactions are absent. The E(s2)-optimal foldover supersaturated designs are constructed, and further optimization is also considered for these E(s2)-optimal supersaturated designs. Strong orthogonal arrays were recently introduced and studied as a class of space-filling designs for computer experiments. This dissertation tackles two important problems that so far have not been addressed in the literature. The first problem is how to develop concreteconstructions for strong orthogonal arrays of strength 3. We provide a systematic and comprehensive study on the construction of these arrays, with the aim at better space-filling properties. Besides various characterizing results, three families of arrays of strength 3 are presented. The other important problem is that of design selection for strong orthogonal arrays. We conduct a systematic investigation into this problem with the focus on strong orthogonal arrays of strength 2+ and 2. We first select arrays of strength 2+ by examining their 3-dimensional projections, and then formulate a general framework for the selection of arrays of strength 2 by looking at their 2-dimensional projections. Both theoretical and computational results for arrays are presented.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Boxin Tang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

A statistical investigation of data from the NHL Combine

Date created: 
2019-05-22
Abstract: 

This project seeks to discover useful information from the NHL Combine results by comparing NHL Central Scouting Service rankings, NHL Draft results and measures of player evaluation. Data management is central to this project and we describe the details of handling datasets including the large and proprietary Combine dataset. Many data management decisions are made based on knowledge from the sport of hockey. The investigation of three questions of interest are carried out utilizing modern machine learning techniques such as random forests. Investigation 1 determines whether the Combine serves any purpose in terms of modifying the opinion of Central Scouting. Investigation 2 focuses on which test results of the Combine are important in predicting prospects’ future development. Investigation 3 considers how the Combine results revise Central Scouting’s beliefs.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Covariance-adjusted, sparse, reduced-rank regression with application to imaging-genetics data

Author: 
Date created: 
2019-05-31
Abstract: 

Alzheimer's disease (AD) is one of the most challenging diseases in the world and it is crucial for researchers to explore the relationship between AD and genes. In this project, we analyze data from 179 cognitively normal individuals that contain magnetic resonance imaging measures in 56 brain regions of interest and alternate allele counts of 510 single nucleotide polymorphisms (SNPs) obtained from 33 candidate genes for AD, provided by the AD Neuroimaging Initiative (ADNI). Our objectives are to explore the data structure and prioritize interesting SNPs. Using standard linear regression models is inappropriate in this research context, because they cannot account for sparsity in the SNP effects and the spatial correlations between brain regions. Thus, we review and apply the method of covariance-adjusted, sparse, reduced-rank regression (Cov-SRRR) that simultaneously performs variable selection and covariance estimation to the data of interest. In our findings, SNP \textit{rs16871157} has the highest variable importance probability (VIP) in bootstrapping. Also, the estimated coefficient values corresponding to the thickness measures of the temporal lobe area have largest absolute values and are negative, which is consistent with current AD research.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Jinko Graham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Selecting baseline two-level designs using optimality and aberration criteria when some two-factor interactions are important

Author: 
Date created: 
2019-06-14
Abstract: 

The baseline parameterization is less commonly used in factorial designs than the orthogonal parameterization. However, the former is more natural than the latter when there exists a default or preferred setting for each factor in an experiment. The current method selects optimal baseline designs for estimating a main effect model. In this project, we consider the selection of optimal baseline designs when estimates of both main effects and some two-factor interactions are wanted. Any other potentially active effect causes bias in estimation of the important effects. To minimize the contamination of these potentially active effects, we propose a new minimum aberration criterion. Moreover, an optimality criterion is used to minimize the variances of the estimates. Finally, we develop a search algorithm for selecting optimal baseline designs based on these criteria and present some optimal designs of 16 and 20 runs for models with up to three important two-factor interactions.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Boxin Tang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Joint modeling of longitudinal and time-to-event data with the application to kidney transplant data

Author: 
Date created: 
2018-12-13
Abstract: 

The main thesis develops the novel and powerful statistical methodology to solve the problems in kidney transplant. Firstly, we use functional principal component analysis (FPCA) through conditional expectation to explore major sources of variations of GFR curves. The estimated FPC scores can be used to cluster GFR curves. Ordering FPC scores can detect abnormal GFR curves. FPCA can effectively estimate missing GFR values and predict GFR values. Secondly, we propose new joint models with mixed-effect and Accelerated Failure Time (AFT) submodels, where the piecewise linear function is used to calculate the non-proportional dynamic hazard ratio curve of a time-dependent side event. The finite sample performance of the proposed method is investigated in simulation studies. Our method is demonstrated by fitting the joint model for some clinical kidney data. Thirdly, we develop a joint model with FPCA and multi-state model to fit the longitudinal and multiple time-to event outcomes together. FPCA is efficient in reducing the dimensions of the longitudinal trajectories. Multistate submodel can be used to describe the dynamic process of multiple time-to-event outcomes. The relationships between the longitudinal and time-to-event outcomes can be assessed based on the shared latent feathers. The latent variables FPC scores are significantly related to time-to-event outcomes in the application example, and Cox model may cause bias for multiple time-to event outcomes compared with multi-state model. Fourthly, we develop a flexible class joint model of generalized linear latent variables for multivariate responses, which has an underlying Gaussian latent processes. The model accommodates any mixture of outcomes from the exponential family. Monte Carlo EM is proposed for parameter estimation and the variance components of the latent processes. We demonstrate this methodology by kidney transplant studies. Finally, in many social and health studies, measurement of some covariates are only available from units of subjects, rather than from individual. Such kind of measures are referred as to aggregate average exposures. The current method fails to evaluate high-order or nonlinear effect of aggregated exposures. Therefore, we develop a nonparametric method based on local linear fitting to overcome the difficulty. We demonstrate this methodology by kidney transplant studies.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jiguo Cao
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Multidimensional scaling for phylogenetics

Author: 
Date created: 
2019-04-11
Abstract: 

We study a novel approach to determine the phylogenetic tree based on multidimensional scaling and Euclidean Steiner minimum tree. Pairwise sequence alignment method is implemented to align the objects such as DNA sequences and then some evolutionary models are applied to get the estimated distance matrix. Given the distance matrix, multidimensional scaling is widely used to reconstruct the map which has coordinates of the data points in a lower-dimensional space while preserves the distance. We employ both Classical multidimensional scaling and Bayesian multidimensional scaling on the distance matrix to obtain the coordinates of the objects. Based on the coordinates, the Euclidean Steiner minimum tree could be constructed and served as a candidate for the phylogenetic tree. The result from the simulation study indicates that the use of the Euclidean Steiner minimum tree as a phylogenetic tree is feasible.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Predicting ovarian cancer survival times: Feature selection and performance of parametric, semi-parametric, and random survival forest methods

Author: 
Date created: 
2019-04-23
Abstract: 

Survival time predictions have far-reaching implications. For example, such predictions can be influential in constructing a personalized treatment plan that is of benefit to both physicians and patients. Advantages include planning the best course of treatment considering the allocation of health care services and resources, as well as the patient's overall health or personal wishes. Predictions also play an important role in providing realistic expectations and subsequently managing quality of life for the patient's residual lifetime. Unfortunately, survival data can be highly variable, making precise predictions difficult or impossible. This project explores methods of predicting time to death for ovarian cancer patients. The dataset consists of a multitude of predictors, including some that may be unimportant. The performances of various prediction methods that allow for feature selection (the Weibull model, Cox proportional hazards model, and the random survival forest) are evaluated. Prediction errors are assessed using Harrell's concordance index and a version of the expected integrated Brier score.We find that the Weibull and Cox models provide the best predictions of survival distributions in this context. Moreover, we are able to identify subsets of predictors that lead to reduced prediction error and are clinically meaningful.

Document type: 
Graduating extended essay / Research project
Senior supervisor: 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An efficient statistical method of detecting introgressive events from big genomic data

Author: 
Date created: 
2019-04-09
Abstract: 

Introgressive hybridization, also called introgression, is the gene flow from one species to another due to mating between species. The genetic signals of introgression are not always obviously observed. Current methods of detecting introgressive events rely on the analysis of orthologous markers, and therefore do not consider gene duplication and gene loss. Since introgression leaves a phylogenetic signal similar to horizontal gene transfer, introgression events can be detected under a gene tree-species tree reconciliation framework, which simultaneously accounts for evolutionary mechanisms including gene duplication, gene loss, and gene transfer. In this work, the reconciliation-based method has been applied to a large dataset of Anopheles mosquito genomes. We recover extensive introgression that occurs in gambiae complex, a group of African mosquitoes, although with some variations compared to previous reports. Our analysis results also imply a possible ancient introgression between the Asian and African mosquitoes.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.