# Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

## Numerical approximation algorithms for pension funding

Author:
Date created:
2021-08-04
Abstract:

It is difficult to find closed-form optimal decisions in the context of pension plans. Therefore, we often need to rely on numerical algorithms to find approximate optimal decisions. In this report, we present two numerical algorithms that can be applied to solve optimal pension funding problems: the value function approximation and the grid value approximation. The value function approximation method applies to models with infinite time horizons and approximates the parameters of the value function by minimizing the difference between the true and approximate evaluations of the Hamilton–Jacobi–Bellman (HJB) equation. The grid value approximation method is used for models with finite time horizons. It works iteratively with backward and forward stages and approximates the optimal decisions directly without using the HJB equation. Numerical results are presented to compare approximate and true solutions for optimal contributions and share in risky assets for classic problems in the pension literature.

Document type:
Graduating extended essay / Research project
File(s):
Supervisor(s):
Jean-François Bégin
Barbara Sanders
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Project) M.Sc.

## Autoregressive mixed effects models and an application to annual income of cancer survivors

Author:
Date created:
2021-04-26
Abstract:

Longitudinal observations of income are often strongly autocorrelated, even after adjusting for independent variables. We explore two common longitudinal models that allow for residual autocorrelation: 1. the autoregressive error model (a linear mixed effects model with an AR(1) covariance structure), and 2. the autoregressive response model (a linear mixed effects model that includes the first lag of the response variable as an independent variable). We explore the theoretical properties of these models and illustrate the behaviour of parameter estimates using a simulation study. Additionally, we apply the models to a data set containing repeated (annual) observations of income and sociodemographic variables on a sample of breast cancer survivors. Our preliminary results suggest that the autoregressive response model may severely overestimate the magnitude of the effect of cancer. Our findings will guide future, comprehensive study of the short- and long-term effects of a breast cancer diagnosis on a survivor’s annual net income.

Document type:
Graduating extended essay / Research project
File(s):
Supervisor(s):
Rachel Altman
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Project) M.Sc.

## Sequence clustering for genetic mapping of binary traits

Author:
Date created:
2021-08-24
Abstract:

Sequence relatedness has potential application to fine-mapping genetic variants contributing to inherited traits. We investigate the utility of genealogical tree-based approaches to fine-map causal variants in three different projects. In the first project, through coalescent simulation, we compare the ability of several popular methods of association mapping to localize causal variants in a sub-region of a candidate genomic region. We consider four broad classes of association methods, which we describe as single-variant, pooled-variant, joint-modelling and tree-based, under an additive genetic-risk model. We also investigate whether differentiating case sequences based on their carrier status for a causal variant can improve fine-mapping. Our results lend support to the potential of tree-based methods for genetic fine-mapping of disease. In the second project, we develop an R package to dynamically cluster a set of single-nucleotide variant sequences. The resulting partition structures provide important insight into the sequence relatedness. In the third project, we investigate the ability of methods based on sequence relatedness to fine-map rare causal variants and compare it to genotypic association methods. Since the true gene genealogy is unknown in reality, we apply the methods developed in the second project to estimate the sequence relatedness. We also pursue the idea of reclassifying case sequences into their carrier status using the idea of genealogical nearest neighbours. We find that method based on sequence relatedness is competitive for fine-mapping rare causal variants. We propose some general recommendations for fine-mapping rare variants in case-control association studies.

Document type:
Thesis
File(s):
Supervisor(s):
Jinko Graham
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Thesis) Ph.D.

## Post-selection inference

Author:
Date created:
2021-04-21
Abstract:

Forward Stepwise Selection is a widely used model selection algorithm. It is, however, hard to do inference for a model that is already cherry-picked. A post-selection inference method called selective inference is investigated. Beginning with very simple examples and working towards more complex ones, we evaluate the method's performance in terms of its power and coverage probability though a simulation study. The target of inference is investigated and the impact of the amount of information used to construct conditional conference intervals is investigated. To achieve the same level of coverage probability, the more conditions we use, the wider the Confidence Interval is -- the effect can be extreme. Moreover, we investigate the impact of multiple conditioning, as well as the importance of the normality assumption on which the underlying theory is based. For models with not very many parameters (p << n), we find normality is not crucial in terms of the test coverage probability.

Document type:
Graduating extended essay / Research project
File(s):
Supervisor(s):
Richard Lockhart
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Project) M.Sc.

## An efficient approach to pruning regression trees using a modified Bayesian information criterion

Author:
Date created:
2021-04-14
Abstract:

By identifying relationships between regression tree construction and change-point detection, we show that it is possible to prune a regression tree efficiently using properly modified information criteria. We prove that one of the proposed pruning approaches that uses a modified Bayesian information criterion consistently recovers the true tree structure provided that the true regression function can be represented as a subtree of a full tree. In practice, we obtain simplified trees that can have prediction accuracy comparable to trees obtained using standard cost-complexity pruning. We briefly discuss an extension to random forests that prunes trees adaptively in order to prevent excessive variance, building upon the work of other authors.

Document type:
Graduating extended essay / Research project
File(s):
Supervisor(s):
Thomas Loughin
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Project) M.Sc.

## Some new methods and models in functional data analysis

Author:
Date created:
2020-06-19
Abstract:

With new developments in modern technology, data are recorded continuously on a large scale over finer and finer grids. Such data push forward the development of functional data analysis (FDA), which analyzes information on curves or functions. Analyzing functional data is intrinsically an infinite-dimensional problem. Functional partial least squares method is a useful tool for dimension reduction. In this thesis, we propose a sparse version of the functional partial least squares method which is easy to interpret. Another problem of interest in FDA is the functional linear regression model, which extends the linear regression model to the functional context. We propose a new method to study the truncated functional linear regression model which assumes that the functional predictor does not influence the response when the time passes a certain cutoff point. Motivated by a recent study of the instantaneous in-game win probabilities for the National Rugby League, we develop novel FDA techniques to determine the distributions in a Bayesian model.

Document type:
Thesis
File(s):
Supervisor(s):
Jiguo Cao
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Thesis) Ph.D.

## Systematic comparison of designs and emulators for computer experiments using a library of test functions

Author:
Date created:
2020-12-16
Abstract:

As computational resources have become faster and more economical, scientific research has transitioned from using only physical experiments to using simulationbased exploration. A body of literature has since grown aimed at the design and analysis of so-called computer experiments. While this literature is large and active, little work has been focused on comparing methods. This project presents ways of comparing and evaluating both design and emulation methods for computer experiments. Using a suite of test functions — in this work we introduce the Virtual Library of Computer Experiments a procedure is established which can provide guidance as to how to proceed in simulation problems. An illustrative comparison is performed for each context; putting three emulators, then four experimental designs up against each other; while also highlighting possible considerations for test function choice.

Document type:
Graduating extended essay / Research project
File(s):
Supervisor(s):
Derek Bingham
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Project) M.Sc.

## Efficient Bayesian parameter inference for COVID-19 transmission models

Author:
Date created:
2020-12-17
Abstract:

Many transmission models have been proposed and adapted to reflect changes in policy for mitigating the spread of COVID-19. Often these models are applied without any formal comparison with previously existing models. In this project, we use an annealed sequential Monte Carlo (ASMC) algorithm to estimate parameters of these transmission models. We also use Bayesian model selection to provide a framework through which the relative performance of transmission models can be compared in a statistically rigorous manner. The ASMC algorithm provides an unbiased estimate of the marginal likelihood which can be computed at no additional computational cost. This offers a significant computational advantage over MCMC methods which require expensive post hoc computation to estimate the marginal likelihood. We find that ASMC can produce results that are comparable to MCMC in a fraction of the time.

Document type:
Graduating extended essay / Research project
File(s):
Supervisor(s):
Liangliang Wang
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Project) M.Sc.

## Contextual batting and bowling in limited overs cricket

Author:
Date created:
2020-11-25
Abstract:

Cricket is a sport for which many batting and bowling statistics have been proposed. However, a feature of cricket is that the level of aggressiveness adopted by batsmen is dependent on match circumstances. It is therefore relevant to consider these circumstances when evaluating batting and bowling performances. This project considers batting performance in the second innings of limited overs cricket when a target has been set. The runs required, the number of overs completed and the wickets taken are relevant in assessing the batting performance. We produce a visualization for second innings batting which describes how a batsman performs under different circumstances. The visualization is then reduced to a single statistic “clutch batting” which can be used to compare batsmen. An analogous analysis is then provided for bowlers based on the symmetry between batting and bowling, and we define a statistic “clutch bowling”.

Document type:
Graduating extended essay / Research project
File(s):
Supervisor(s):
Harsha Perera
Timothy Swartz
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Project) M.Sc.

## A flexible group benefits framework for pricing deposit rates

Author:
Date created:
2020-11-16
Abstract:

Currently, most flexible group benefit plans are designed and priced based on deterministic assumptions about the plan members’ option selections. This can cause the adverse selection spiral, threatening the sustainability of the plan. We therefore propose a comprehensive framework with a novel pricing formula that incorporates both a model for claims and a model for plan members’ enrollment decisions to prevent adverse selection. We find through simulation that our proposed pricing formula outperforms the traditional pricing practice by keeping flex plans sustainable over time. In addition to preventing the adverse selection spiral through pricing, our framework also serves as a tool to evaluate the impact of other parameters such as changes in plan designs, health costs, and member decision.

Document type:
Graduating extended essay / Research project
File(s):
Supervisor(s):
Jean-François Bégin
Barbara Sanders
Department:
Science: Department of Statistics and Actuarial Science
Thesis type:
(Project) M.Sc.