Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Statistical Inference under Latent Class Models, with Application to Risk Assessment in Cancer Survivorship Studies

Author: 
Date created: 
2015-11-12
Abstract: 

Motivated by a cancer survivorship program, this PhD thesis aims to develop methodology for risk assessment, classification, and prediction. We formulate the primary data collected from a cohort with two underlying categories, the at-risk and not-at-risk classes, using latent class models, and we conduct both cross-sectional and longitudinal analyses. We begin with a maximum pseudo-likelihood estimator (pseudo-MLE) as an alternative to the maximum likelihood estimator (MLE) under a mixture Poisson distribution with event counts. The pseudo-MLE utilizes supplementary information on the not-at-risk class from a different population. It reduces the computational intensity and potentially increases the estimation efficiency. To obtain statistical methods that are more robust than likelihood-based methods to distribution misspecification, we adapt the well-established generalized estimating equations (GEE) approach under the mean-variance model corresponding to the mixture Poisson distribution. The inherent computing and efficiency issues in the application of GEEs motivate two sets of extended GEEs, using the primary data supplemented by information from the second population alone or together with the available information on individuals in the cohort who are deemed to belong to the at-risk class. We derive asymptotic properties of the proposed pseudo-MLE and the estimators from the extended GEEs, and we estimate their variances by extended Huber sandwich estimators. We use simulation to examine the finite-sample properties of the estimators in terms of both efficiency and robustness. The simulation studies verify the consistency of the proposed parameter estimators and their variance estimators. They also show that the pseudo-MLE has efficiency comparable to that of the MLE, and the extended GEE estimators are robust to distribution misspecification while maintaining satisfactory efficiency. Further, we present an extension of the favourable extended GEE estimator to longitudinal settings by adjusting for within-subject correlation. The proposed methodology is illustrated with physician claims from the cancer program. We fit different latent class models for the counts and costs of the physician visits by applying the proposed estimators. We use the parameter estimates to identify the risk of subsequent and ongoing problems arising from the subjects’ initial cancer diagnoses. We perform risk classification and prediction using the fitted latent class models.

Document type: 
Thesis
File(s): 
Senior supervisor: 
X. Joan Hu
John J. Spinelli
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Application of Relational Models in Mortality Immunization

Author: 
Date created: 
2015-07-29
Abstract: 

The prediction of future mortality rates by any existing mortality projection models is hardly tobe exact, which causes an exposure to mortality and longevity risks for life insurance companies.Since a change in mortality rates has opposite impacts on the surpluses of life insurance andannuity products, hedging strategies of mortality and longevity risks can be implemented bycreating an insurance portfolio of both life insurance and annuity products. In this project, wedevelop a framework of implementing non-size free matching strategies to hedge against mortalityand longevity risks. We apply relational models to capture the mortality movements byassuming that the simulated mortality sequence is a proportional and/or a constant change ofthe expected one, and the amount of the changes varies in the length of the sequence. Withthe magnitude of the proportional and/or constant changes, we determine the optimal weightsof allocating the life insurance and annuity products in a portfolio for mortality immunizationaccording to each of the proposed matching strategies. Comparing the hedging performanceof non-size free matching strategies with size free ones proposed by Lin and Tsai (2014), wedemonstrate that non-size free matching strategies can hedge against mortality and longevityrisks more effectively than the corresponding size free ones.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Cary Tsai
Department: 
Science: Department of Biomedical Physiology and Kinesiology
Thesis type: 
(Thesis) M.Sc.

Understanding the impact of heteroscedasticity on the predictive ability of modern regression methods

Date created: 
2015-08-17
Abstract: 

As the size and complexity of modern data sets grows, more and more prediction methods are developed. Despite the growing sophistication of methods, there is not a well-developed literature on how heteroscedasticity affects modern regression methods. We aim to understand the impact of heteroscedasticity on the predictive ability of modern regression methods. We accomplish this by reviewing the visualization and diagnosis of heteroscedasticity, as well as developing a measure for quantifying it. These methods are used on 42 real data sets in order to understand the prevalence and magnitude ``typical'' to data. We use the knowledge from this analysis to develop a simulation study that explores the predictive ability of nine regression methods. We vary a number of factors to determine how they influence prediction accuracy in conjunction with, and separately from, heteroscedasticity. These factors include data linearity, the number of explanatory variables, the proportion of unimportant explanatory variables, and the signal-to-noise ratio. We compare prediction accuracy with and without a variance-stabilizing log-transformation. The predictive ability of each method is compared by using the mean squared error, which is a popular measure of regression accuracy, and the median absolute standardized deviation, a measure that accounts for the potential of heteroscedasticity.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Thomas Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

A Pseudo Non-Parametric Buhlmann Credibility Approach to Modeling Mortality Rates

Author: 
Date created: 
2015-07-29
Abstract: 

Credibility theory is applied in property and casualty insurance to perform prospective experiencerating, i.e., to determine the future premiums to charge based on both past experienceand the underlying group rate. Insurance companies assign a credibility factor Z to a specificpolicyholder’s own past data, and put 1 − Z onto the prior mean which is the group rate determinedby actuaries to reflect the expected value for all risk classes. This partial credibilitytakes advantage of both policyholder’s own experience and the entire group’s characteristics,and thus increases the accuracy of estimated value so that the insurance companies can staycompetitive in the market. Faced with its popular applications in property and casualty insurance,this project aims to apply the credibility theory to projected mortality rates from threeexisting mortality models. The approach presented in this project violates one of the conditions,and thus produces the pseudo non-parametric Bühlmann estimates of the forecasted mortalityrates. Numerical results show that the accuracy of forecasted mortality rates are significantlyimproved after applying the non-parametric Bühlmann method to the Lee-Carter model, theCBD model, and the linear regression-random walk (LR-RW) model. A measure of mean absolutepercentage error (MAPE) is adopted to compare the performances in terms of accuracy ofmortality prediction.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Cary Tsai
Department: 
Science:
Thesis type: 
(Project) M.Sc.

An approach to constructing "good" two-level orthogonal factorial designs with large run sizes

Author: 
Date created: 
2015-07-20
Abstract: 

Due to the increasing demand for two-level fractional factorials in areas of science and technology, it is highly desirable to have a simple and convenient method available for constructing optimal factorials. Minimum G_2-aberration is a popular criterion to use for selecting optimal designs. However, direct application of this criterion is challenging for large designs. In this project, we propose an approach to constructing a "good" factorial with a large run size using two small minimum G_2-aberration designs. Theoretical results are derived that allow the word length pattern of the large design to be obtained from those of the two small designs. Regular 64-run factorials are used to evaluate this approach. The designs from our approach are very close to the corresponding minimum aberration designs, and they are even equivalent to the corresponding minimum aberration designs, when the number of factors is large.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Boxin Tang
Department: 
Science:
Thesis type: 
(Project) M.Sc.

How does climate change affect forest fire rate in British Columbia?

Author: 
Date created: 
2015-08-20
Abstract: 

Climate change is known to be an important risk of forest fire. Studies have shown an increased risk of fire because of rising temperatures, drier conditions, more lightning from stronger storms, added dry fuel for fires and a longer fire season and "global warming makes forests more susceptible to fire." In this paper, we use modern functional data analysis methods to explore the variations of forest fire rate in British Columbia, Canada among 63 consecutive years (1950-2012), and to investigate the historical effect of temperature and precipitation on forest fire rate. Functional principle component analysis shows that forest fire rate has increased since 2004 compared to years before that. Historical functional linear model shows that the concurrent effect of temperature and precipitation are both strong. Higher temperature and less precipitation lead to more forest fire. Temperature from January to July has a historical effect on forest fire rate from August to November, while only short term effect of precipitation up to two months is detected.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Bin Zhao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Threshold-free measure for assessing the performance of risk prediction with censored data

Author: 
Date created: 
2015-07-24
Abstract: 

The area under the receiver operating characteristic curve (AUC) is a popular threshold-free metric to retrospectively measure the discriminatory performance of medical tests. In risk prediction or medical screening, main interests often focus on accurately predicting the future risk of an event of interest or prospectively stratifying individuals into risk categories. Thus, AUC might not be optimal in assessing the predictive performance for such purposes. Alternative accuracy measures have been proposed, such as the positive predictive value (PPV). Yuan et al. (2015) proposed a threshold-free metric, the average positive predictive value (AP), which is the area under the PPV versus true positive fraction (TPF) curve, when the outcome is binary disease status. In this thesis, we propose the time-dependent AP when the outcome is censored event time. Empirical estimates of the time-dependent AP (AP_t0) are developed, where the inverse weighted probability technique is applied to deal with censoring. In addition, inference procedures — using bootstrap and perturbation resampling—are proposed to construct confidence intervals. We conduct simulation studies to investigate the performance of the proposed estimation and inference procedures in finite samples. The method is also illustrated through a real data analysis.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Qian Zhou
Yan Yuan
Department: 
Science: Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

The optimal payment reduction ratios for a catastrophe bond

Author: 
Date created: 
2015-01-15
Abstract: 

Catastrophe bonds, also known as CAT bonds, are insurance-linked securities that help to transfer catastrophe risks from insurance industry to bond holders. If there is a catastrophe, the CAT bond is triggered and the future bond payments are reduced. This projects first presents a general pricing formula for a CAT bond with coupon payments, which can be adapted to various assumptions for a catastrophe loss process. Next, it gives formulas for the optimal payment reduction ratios which maximize two measurements of risk reduction, hedge effectiveness rate (HER) and hedge effectiveness (HE), respectively, and examines how the optimal payment reduction ratios help reinsurance or insurance companies to mitigate extreme catastrophe losses. Last, it shows how strike price, maturity, parameters of the catastrophe loss process and different interest rate assumptions affect the optimal payment reduction ratios. Numerical examples are also given for illustrations.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Cary Tsai
Department: 
Science: Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Hierarchical bayesian modeling of health insurance claims

Author: 
Date created: 
2015-03-06
Abstract: 

The purpose of this project is to propose a statistical model for health insurance total claim amounts classified by age group, region of residence and time horizon of the insured population under Bayesian framework. This model can be used to predict future total claim amounts and thus to facilitate premium determination. The prediction is based on the past observed information and prior beliefs about the insured population, number of claims and amount of claims. The insured population growth is modelled by a generalized exponential growth model (GEGM), which takes into account the random effects in age, region and time classifications. The number of claims for each classified group is assumed Poisson distributed and independent of the size of the individual claims. A simulation study is conducted to test the effectiveness of modelling and estimation, and Markov chain Monte Carlo (MCMC) is used for parameter estimation. Based on the predicted values, the premiums are estimated using four premium principles and two risk measures.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Yi Lu
Tim Swartz
Department: 
Science:
Thesis type: 
(Project) M.Sc.

Natural Hedging Using Multi-population Mortality Forecasting Models

Author: 
Date created: 
2014-12-11
Abstract: 

No mortality projection model can capture future mortality changes accurately so that the actual mortality rates are different from the projected ones. The movement of mortality rates has oppositive impacts on the values of life insurance and annuity products, which creates a chance of nature hedge for both life insurer and annuity provider. A life insurer and an annuity provider can swap their life insurance and annuity business for each other to form their own portfolios for natural hedge. This project is mainly focused on determining the weights of a portfolio of life insurance and annuity products by minimizing the variance of the loss function of the portfolio to reduce mortality and longevity risks for each of the life insurer and the annuity provider. Four Lee-Carter-based models are applied to model the co-movement of two populations of life insurance and annuity insureds, and then determine the weights for comparisons. The block bootstrap method, a model-/parameter-free approach, is also adopted with numerical illustrations to compare the hedging performances among the four models.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Dr. Cary Chi-Liang Tsai
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.