Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Exploring out of distribution: Deep neural networks and the human brain

Author: 
Date created: 
2021-08-17
Abstract: 

Deep neural networks have achieved state-of-the-art performance across a wide range of tasks. Convolutional neural networks, with their ability to learn complex spatial features, have surpassed human-level accuracy on many image classification problems. However, these architectures are still often unable to make accurate predictions when the test data distribution differs from that of the training data. In contrast, humans naturally excel at such out-of-distribution generalizations. Novel solutions have been developed to improve a deep neural net's ability to handle out-of-distribution data. The advent of methods such as Push-Pull and AugMix have improved model robustness and generalization. We are interested in assessing whether or not such models achieve the most human-like generalization across a wide variety of image classification tasks. We identify AugMix as the most human-like deep neural network under our set of benchmarks. Identifying such models sheds light on human cognition and the analogy between neural nets and the human brain. We also show that, contrary to our intuition, transfer learning worsens the performance of Push-Pull.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Lloyd T. Elliott
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

On the Bayesian estimation of jump-diffusion models in finance

Date created: 
2021-05-19
Abstract: 

The jump-diffusion framework introduced by Duffie et al. (2000) encompasses most one factor models used in finance. Due to the model complexity of this framework, the particle filter (e.g., Hurn et al., 2015; Jacobs & Liu, 2018) and combinations of Gibbs and Metropolis-Hastings samplers (e.g., Eraker et al., 2003; Eraker, 2004) have been the tools of choice for its estimation. However, Bégin & Boudreault (2020) recently showed that the discrete nonlinear filter (DNF) of Kitagawa (1987) can also be used for fast and accurate maximum likelihood estimation of jump-diffusion models. In this project report, we combine the DNF with Markov chain Monte Carlo (MCMC) methods for Bayesian estimation in the spirit of the particle MCMC algorithm of Andrieu et al. (2010). In addition, we show that derivative prices (i.e., European option prices) can be easily included into the DNF’s likelihood evaluations, which allows for efficient joint Bayesian estimation.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jean-François Bégin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Numerical approximation algorithms for pension funding

Author: 
Date created: 
2021-08-04
Abstract: 

It is difficult to find closed-form optimal decisions in the context of pension plans. Therefore, we often need to rely on numerical algorithms to find approximate optimal decisions. In this report, we present two numerical algorithms that can be applied to solve optimal pension funding problems: the value function approximation and the grid value approximation. The value function approximation method applies to models with infinite time horizons and approximates the parameters of the value function by minimizing the difference between the true and approximate evaluations of the Hamilton–Jacobi–Bellman (HJB) equation. The grid value approximation method is used for models with finite time horizons. It works iteratively with backward and forward stages and approximates the optimal decisions directly without using the HJB equation. Numerical results are presented to compare approximate and true solutions for optimal contributions and share in risky assets for classic problems in the pension literature.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jean-François Bégin
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Autoregressive mixed effects models and an application to annual income of cancer survivors

Author: 
Date created: 
2021-04-26
Abstract: 

Longitudinal observations of income are often strongly autocorrelated, even after adjusting for independent variables. We explore two common longitudinal models that allow for residual autocorrelation: 1. the autoregressive error model (a linear mixed effects model with an AR(1) covariance structure), and 2. the autoregressive response model (a linear mixed effects model that includes the first lag of the response variable as an independent variable). We explore the theoretical properties of these models and illustrate the behaviour of parameter estimates using a simulation study. Additionally, we apply the models to a data set containing repeated (annual) observations of income and sociodemographic variables on a sample of breast cancer survivors. Our preliminary results suggest that the autoregressive response model may severely overestimate the magnitude of the effect of cancer. Our findings will guide future, comprehensive study of the short- and long-term effects of a breast cancer diagnosis on a survivor’s annual net income.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Sequence clustering for genetic mapping of binary traits

Date created: 
2021-08-24
Abstract: 

Sequence relatedness has potential application to fine-mapping genetic variants contributing to inherited traits. We investigate the utility of genealogical tree-based approaches to fine-map causal variants in three different projects. In the first project, through coalescent simulation, we compare the ability of several popular methods of association mapping to localize causal variants in a sub-region of a candidate genomic region. We consider four broad classes of association methods, which we describe as single-variant, pooled-variant, joint-modelling and tree-based, under an additive genetic-risk model. We also investigate whether differentiating case sequences based on their carrier status for a causal variant can improve fine-mapping. Our results lend support to the potential of tree-based methods for genetic fine-mapping of disease. In the second project, we develop an R package to dynamically cluster a set of single-nucleotide variant sequences. The resulting partition structures provide important insight into the sequence relatedness. In the third project, we investigate the ability of methods based on sequence relatedness to fine-map rare causal variants and compare it to genotypic association methods. Since the true gene genealogy is unknown in reality, we apply the methods developed in the second project to estimate the sequence relatedness. We also pursue the idea of reclassifying case sequences into their carrier status using the idea of genealogical nearest neighbours. We find that method based on sequence relatedness is competitive for fine-mapping rare causal variants. We propose some general recommendations for fine-mapping rare variants in case-control association studies.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Jinko Graham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Post-selection inference

Author: 
Date created: 
2021-04-21
Abstract: 

Forward Stepwise Selection is a widely used model selection algorithm. It is, however, hard to do inference for a model that is already cherry-picked. A post-selection inference method called selective inference is investigated. Beginning with very simple examples and working towards more complex ones, we evaluate the method's performance in terms of its power and coverage probability though a simulation study. The target of inference is investigated and the impact of the amount of information used to construct conditional conference intervals is investigated. To achieve the same level of coverage probability, the more conditions we use, the wider the Confidence Interval is -- the effect can be extreme. Moreover, we investigate the impact of multiple conditioning, as well as the importance of the normality assumption on which the underlying theory is based. For models with not very many parameters (p << n), we find normality is not crucial in terms of the test coverage probability.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Richard Lockhart
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An efficient approach to pruning regression trees using a modified Bayesian information criterion

Author: 
Date created: 
2021-04-14
Abstract: 

By identifying relationships between regression tree construction and change-point detection, we show that it is possible to prune a regression tree efficiently using properly modified information criteria. We prove that one of the proposed pruning approaches that uses a modified Bayesian information criterion consistently recovers the true tree structure provided that the true regression function can be represented as a subtree of a full tree. In practice, we obtain simplified trees that can have prediction accuracy comparable to trees obtained using standard cost-complexity pruning. We briefly discuss an extension to random forests that prunes trees adaptively in order to prevent excessive variance, building upon the work of other authors.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Thomas Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Some new methods and models in functional data analysis

Author: 
Date created: 
2020-06-19
Abstract: 

With new developments in modern technology, data are recorded continuously on a large scale over finer and finer grids. Such data push forward the development of functional data analysis (FDA), which analyzes information on curves or functions. Analyzing functional data is intrinsically an infinite-dimensional problem. Functional partial least squares method is a useful tool for dimension reduction. In this thesis, we propose a sparse version of the functional partial least squares method which is easy to interpret. Another problem of interest in FDA is the functional linear regression model, which extends the linear regression model to the functional context. We propose a new method to study the truncated functional linear regression model which assumes that the functional predictor does not influence the response when the time passes a certain cutoff point. Motivated by a recent study of the instantaneous in-game win probabilities for the National Rugby League, we develop novel FDA techniques to determine the distributions in a Bayesian model.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Systematic comparison of designs and emulators for computer experiments using a library of test functions

Author: 
Date created: 
2020-12-16
Abstract: 

As computational resources have become faster and more economical, scientific research has transitioned from using only physical experiments to using simulationbased exploration. A body of literature has since grown aimed at the design and analysis of so-called computer experiments. While this literature is large and active, little work has been focused on comparing methods. This project presents ways of comparing and evaluating both design and emulation methods for computer experiments. Using a suite of test functions — in this work we introduce the Virtual Library of Computer Experiments a procedure is established which can provide guidance as to how to proceed in simulation problems. An illustrative comparison is performed for each context; putting three emulators, then four experimental designs up against each other; while also highlighting possible considerations for test function choice.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Derek Bingham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Efficient Bayesian parameter inference for COVID-19 transmission models

Author: 
Date created: 
2020-12-17
Abstract: 

Many transmission models have been proposed and adapted to reflect changes in policy for mitigating the spread of COVID-19. Often these models are applied without any formal comparison with previously existing models. In this project, we use an annealed sequential Monte Carlo (ASMC) algorithm to estimate parameters of these transmission models. We also use Bayesian model selection to provide a framework through which the relative performance of transmission models can be compared in a statistically rigorous manner. The ASMC algorithm provides an unbiased estimate of the marginal likelihood which can be computed at no additional computational cost. This offers a significant computational advantage over MCMC methods which require expensive post hoc computation to estimate the marginal likelihood. We find that ASMC can produce results that are comparable to MCMC in a fraction of the time.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.