Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Covariance-adjusted, sparse, reduced-rank regression with adjustment for confounders

Author: 
Date created: 
2021-08-18
Abstract: 

There is evidence that common genetic variation in the gene NEDD9 is associated with developing Alzheimer’s Disease (AD). In this project, we study the relationship between brain-imaging biomarkers of AD and the gene NEDD9 while adjusting for the effects of genetic population structure. The data used in this project, collected by the Alzheimer’s Disease Neuroimaging Initiative (ADNI), consists of magnetic resonance imaging (MRI) measures of 56 brain regions of interest for 200 cognitively normal people and genetic data on Single Nucleotide Polymorphisms (SNPs) obtained from 33 candidate genes for AD. The standard solution to such a multiple response problem is separate simple linear regression models. Such an approach neglects correlations between 56 brain areas and possible sparsity in the SNP effects. In this project, we review a sparse and covariance adjusted reduced-rank regression approach that can select significant predictors and estimate covariance simultaneously, and extend the approach to adjust for confounding variables. We apply the proposed algorithm to the ADNI data, and also simulated data.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Brad McNeney
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Q-learning with online trees

Date created: 
2021-08-13
Abstract: 

Reinforcement learning is one of the major areas of artificial intelligence that has been studied rigorously in recent years. Among numerous methodologies, Q-learning is one of the most fundamental model-free reinforcement learning algorithms, and it has inspired many researchers. Several studies have shown great results by approximating the action-value function, one of the essential elements in Q-learning, using non-linear supervised learning models such as deep neural networks. This combination has led to the surpassing humanlevel performances in complex problems such as the Atari games and Go, which have been difficult to solve with standard tabular Q-learning. However, both Q-learning and the deep neural network typically used as the function approximator require very large computational resources to train. We propose using the online random forest method as the function approximator for the action-value function to mitigate this. We grow one online random forest for each possible action in a Markov decision process (MDP) environment. Each forest approximates the corresponding action-value function for that action, and the agent chooses the action in the succeeding state according to the resulting approximated action-value functions. When the agent executes an action, an observation consisting of the state, action, reward, and the subsequent state is stored in an experience replay. Then, the observations are randomly sampled to participate in the growth of the online random forests. The terminal nodes of the trees in the random forests corresponding to each sample randomly generate tests for the decision tree splits. Among them, the test that gives the lowest residual sum of squares after splitting is selected. The trees of the online random forests grown in this way age each time they take in a sample observation. One of the trees that is older than a certain age is then selected at random and replaced by a new tree according to its out-of-bag error. In our study, forest size plays an important role. Our algorithm constitutes an adaptation of previously developed Online Random Forests to reinforcement learning. To reduce computational costs, we first grow a small-sized forest and then expand them after a certain period of episodes. We observed in our experiments that this forest size expansion showed better performances in later episodes. Furthermore, we found that our method outperformed some deep neural networks in simple MDP environments. We hope that this study will be a medium to promote research on the combination of reinforcement learning and tree-based methods.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Lloyd T. Elliott
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Exploring out of distribution: Deep neural networks and the human brain

Author: 
Date created: 
2021-08-17
Abstract: 

Deep neural networks have achieved state-of-the-art performance across a wide range of tasks. Convolutional neural networks, with their ability to learn complex spatial features, have surpassed human-level accuracy on many image classification problems. However, these architectures are still often unable to make accurate predictions when the test data distribution differs from that of the training data. In contrast, humans naturally excel at such out-of-distribution generalizations. Novel solutions have been developed to improve a deep neural net's ability to handle out-of-distribution data. The advent of methods such as Push-Pull and AugMix have improved model robustness and generalization. We are interested in assessing whether or not such models achieve the most human-like generalization across a wide variety of image classification tasks. We identify AugMix as the most human-like deep neural network under our set of benchmarks. Identifying such models sheds light on human cognition and the analogy between neural nets and the human brain. We also show that, contrary to our intuition, transfer learning worsens the performance of Push-Pull.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Lloyd T. Elliott
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

On the Bayesian estimation of jump-diffusion models in finance

Date created: 
2021-05-19
Abstract: 

The jump-diffusion framework introduced by Duffie et al. (2000) encompasses most one factor models used in finance. Due to the model complexity of this framework, the particle filter (e.g., Hurn et al., 2015; Jacobs & Liu, 2018) and combinations of Gibbs and Metropolis-Hastings samplers (e.g., Eraker et al., 2003; Eraker, 2004) have been the tools of choice for its estimation. However, Bégin & Boudreault (2020) recently showed that the discrete nonlinear filter (DNF) of Kitagawa (1987) can also be used for fast and accurate maximum likelihood estimation of jump-diffusion models. In this project report, we combine the DNF with Markov chain Monte Carlo (MCMC) methods for Bayesian estimation in the spirit of the particle MCMC algorithm of Andrieu et al. (2010). In addition, we show that derivative prices (i.e., European option prices) can be easily included into the DNF’s likelihood evaluations, which allows for efficient joint Bayesian estimation.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jean-François Bégin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Numerical approximation algorithms for pension funding

Author: 
Date created: 
2021-08-04
Abstract: 

It is difficult to find closed-form optimal decisions in the context of pension plans. Therefore, we often need to rely on numerical algorithms to find approximate optimal decisions. In this report, we present two numerical algorithms that can be applied to solve optimal pension funding problems: the value function approximation and the grid value approximation. The value function approximation method applies to models with infinite time horizons and approximates the parameters of the value function by minimizing the difference between the true and approximate evaluations of the Hamilton–Jacobi–Bellman (HJB) equation. The grid value approximation method is used for models with finite time horizons. It works iteratively with backward and forward stages and approximates the optimal decisions directly without using the HJB equation. Numerical results are presented to compare approximate and true solutions for optimal contributions and share in risky assets for classic problems in the pension literature.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jean-François Bégin
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Autoregressive mixed effects models and an application to annual income of cancer survivors

Author: 
Date created: 
2021-04-26
Abstract: 

Longitudinal observations of income are often strongly autocorrelated, even after adjusting for independent variables. We explore two common longitudinal models that allow for residual autocorrelation: 1. the autoregressive error model (a linear mixed effects model with an AR(1) covariance structure), and 2. the autoregressive response model (a linear mixed effects model that includes the first lag of the response variable as an independent variable). We explore the theoretical properties of these models and illustrate the behaviour of parameter estimates using a simulation study. Additionally, we apply the models to a data set containing repeated (annual) observations of income and sociodemographic variables on a sample of breast cancer survivors. Our preliminary results suggest that the autoregressive response model may severely overestimate the magnitude of the effect of cancer. Our findings will guide future, comprehensive study of the short- and long-term effects of a breast cancer diagnosis on a survivor’s annual net income.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Sequence clustering for genetic mapping of binary traits

Date created: 
2021-08-24
Abstract: 

Sequence relatedness has potential application to fine-mapping genetic variants contributing to inherited traits. We investigate the utility of genealogical tree-based approaches to fine-map causal variants in three different projects. In the first project, through coalescent simulation, we compare the ability of several popular methods of association mapping to localize causal variants in a sub-region of a candidate genomic region. We consider four broad classes of association methods, which we describe as single-variant, pooled-variant, joint-modelling and tree-based, under an additive genetic-risk model. We also investigate whether differentiating case sequences based on their carrier status for a causal variant can improve fine-mapping. Our results lend support to the potential of tree-based methods for genetic fine-mapping of disease. In the second project, we develop an R package to dynamically cluster a set of single-nucleotide variant sequences. The resulting partition structures provide important insight into the sequence relatedness. In the third project, we investigate the ability of methods based on sequence relatedness to fine-map rare causal variants and compare it to genotypic association methods. Since the true gene genealogy is unknown in reality, we apply the methods developed in the second project to estimate the sequence relatedness. We also pursue the idea of reclassifying case sequences into their carrier status using the idea of genealogical nearest neighbours. We find that method based on sequence relatedness is competitive for fine-mapping rare causal variants. We propose some general recommendations for fine-mapping rare variants in case-control association studies.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Jinko Graham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Post-selection inference

Author: 
Date created: 
2021-04-21
Abstract: 

Forward Stepwise Selection is a widely used model selection algorithm. It is, however, hard to do inference for a model that is already cherry-picked. A post-selection inference method called selective inference is investigated. Beginning with very simple examples and working towards more complex ones, we evaluate the method's performance in terms of its power and coverage probability though a simulation study. The target of inference is investigated and the impact of the amount of information used to construct conditional conference intervals is investigated. To achieve the same level of coverage probability, the more conditions we use, the wider the Confidence Interval is -- the effect can be extreme. Moreover, we investigate the impact of multiple conditioning, as well as the importance of the normality assumption on which the underlying theory is based. For models with not very many parameters (p << n), we find normality is not crucial in terms of the test coverage probability.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Richard Lockhart
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An efficient approach to pruning regression trees using a modified Bayesian information criterion

Author: 
Date created: 
2021-04-14
Abstract: 

By identifying relationships between regression tree construction and change-point detection, we show that it is possible to prune a regression tree efficiently using properly modified information criteria. We prove that one of the proposed pruning approaches that uses a modified Bayesian information criterion consistently recovers the true tree structure provided that the true regression function can be represented as a subtree of a full tree. In practice, we obtain simplified trees that can have prediction accuracy comparable to trees obtained using standard cost-complexity pruning. We briefly discuss an extension to random forests that prunes trees adaptively in order to prevent excessive variance, building upon the work of other authors.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Thomas Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Some new methods and models in functional data analysis

Author: 
Date created: 
2020-06-19
Abstract: 

With new developments in modern technology, data are recorded continuously on a large scale over finer and finer grids. Such data push forward the development of functional data analysis (FDA), which analyzes information on curves or functions. Analyzing functional data is intrinsically an infinite-dimensional problem. Functional partial least squares method is a useful tool for dimension reduction. In this thesis, we propose a sparse version of the functional partial least squares method which is easy to interpret. Another problem of interest in FDA is the functional linear regression model, which extends the linear regression model to the functional context. We propose a new method to study the truncated functional linear regression model which assumes that the functional predictor does not influence the response when the time passes a certain cutoff point. Motivated by a recent study of the instantaneous in-game win probabilities for the National Rugby League, we develop novel FDA techniques to determine the distributions in a Bayesian model.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.