Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Analysis of Spatio-Temporal Data for Forest Fire Control

Author: 
Date created: 
2015-01-23
Abstract: 

This project aims to establish the relationship of forest fire behavior with ecological/environmental factors, such as forest structure and weather. We analyze records of forest fires during the fire season (May to September) in 1992 from the Forest Fire Management Branch of Ontario Ministry of Natural Resource (OMNR). We start with a preliminary analysis of the data, which includes a descriptive summary and an ordinary linear regression analysis with fire duration as the response. The preliminary analysis indicates that the fire weather index (FWI) used by Natural Resource of Canada is the most relevant together with fire location and starting time. We apply semi-variogram and Moran's I, the conventional methods for exploring spatial patterns, and extend them to investigate spatio-temporal patterns with the fire data. Evaluations of the extended Moran’s I statistic with the residuals of the ordinary linear regression analysis reveal a large departure from the independence and constant variance assumption on the random errors. It motivates two sets of partially linear regression models to accommodate possible nonlinear spatial/temporal patterns of the forest fires. We integrate univariate and bivariate Kernel smoothing procedures with the least squares procedure for estimating the model parameters. Residual analysis indicates satisfactory fittings in both sets of regression analysis. The partially linear regression analyses find that the association of fire duration with FWI varies across different fire management zones, and depends on the fire starting time.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
X. Joan Hu
Department: 
Science:
Thesis type: 
(Project) M.Sc.

Joint analysis of imaging and genomic data to identify associations related to cognitive impairment

Date created: 
2014-12-19
Abstract: 

Both genetic variants and brain region abnormalities are recognized to play a role in cognitive decline. In this project, we explore the relationship between genome-wide variation and region-specific rates of decline in brain structure, as measured by magnetic resonanceimaging. The correspondence between rates of decline in brain regions and single nucleotide polymorphisms (SNPs) is investigated using data from the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1), a study of Alzheimer’s disease and mild cognitive impairment. In these data, the number of SNP and imaging biomarkers greatly exceeds the number of study subjects. To explore these data, we therefore look to modern multivariate statistical techniques that find sparse linear combinations of the two datasets having maximum correlation. These methods are particularly appealing because they greatly reduce the dimensions of the data, providing a low-dimensional representation of the data to explore. Regularization of the correlation structure through a “sparse” singular value decomposition makes multivariate analysis on a large set of biomarkers possible. Using sparse linear combinations of the two datasets also incorporates variable selection into the analysis, providing insight into which genetic variants are associated with cognitive decline. Resampling techniques are used to examine the validity of the results by exploring their reproducibility in independent test sets, and by assessing the stability of the variable selection.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Jinko Graham
Mirza Faisal Beg
Department: 
Science:
Thesis type: 
(Project) M.Sc.

Prediction and Calibration Using Outputs from Multiple Computer Simulators

Date created: 
2014-08-18
Abstract: 

Computer simulators are widely used to describe and explore physical processes. In some cases, several simulators, which can be of different or similar fidelities, are available for this task. A big part of this thesis focuses on combining observations and model runs from multiple computer simulators to build a predictive model for the real process. The resulting models can be used to perform sensitivity analysis for the system, solve inverse problems and make predictions. The approaches are Bayesian and are illustrated through a few simple examples, as well as a real application in predictive science at the Center for Radiative Shock Hydrodynamics at the University of Michigan. Although the computer model can be viewed as an inexpensive way to gain insight into the physical process, it can become computationally expensive continuously exercise the computer simulator. A sequential design strategy is proposed to minimize the total number of function evaluations for finding the global extremum. The practical implementation of the proposed approach is addressed and applied to several examples containing multiple computer codes.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Derek Bingham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Evaluating the Impact of Heteroscedasticity on the Predictive Ability of Modern Regression Techniques

Author: 
Date created: 
2014-08-22
Abstract: 

Over the last decade, the number and sophistication of methods used to do regression on complex datasets have increased substantially. Despite this, our literature review found that research that explores the impact of heteroscedasticity on many widely used modern regression methods appears to be sparse. Thus, our research seeks to clarify the impact that heteroscedasticity has on the predictive effectiveness of modern regression methods. In order to achieve this objective, we begin by analyzing the ability of ten different modern regression methods to predict outcomes for three medium-sized data sets that each feature heteroscedasticity. We then use insights provided from this work to develop a simulation model and design an experiment that explores the impact that various factors have on prediction accuracy of our ten different regression methods. These factors include linearity, sparsity, the signal to noise ratio, the number of explanatory variables, and the use of a variance stabilizing transformation.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Tom Loughin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Exploring Mental Health Related Emergency Department Visits: Frequency of Recurrence and Risk Factors

Author: 
Date created: 
2014-08-20
Abstract: 

This thesis project aims to provide insights into pediatric mental health care and help to improve its current practice. We explore records of mental health related emergency department visits from children and youth. The data are extracted from the provincial health administrative data systems of Alberta. We start with a descriptive data analysis, and then adopt the counting process framework to conduct statistical inference. A generalized (stratified) Cox regression model and a renewal process model are considered. We evaluate the frequency and identify important risk factors with various model specifications. We also account for the gaps of the visit process due to hospitalization. The project presents the estimates of the model parameters via likelihood and partial likelihood approaches. Robust estimates and the non-parametric bootstrap estimates for the standard errors of the parameter estimators are obtained in addition to the likelihood based standard error estimates. We summarize the analysis and outline a few problems for future investigation in the final chapter.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
X. Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Maximin Strong Orthogonal Arrays

Author: 
Date created: 
2014-08-18
Abstract: 

As space-filling designs, orthogonal arrays have been widely used either directly or via OA-based Latin hypercubes in computer experiments. He and Tang (2013) introduced and constructed a new class of arrays, strong orthogonal arrays, for computer experiments. Strong orthogonal arrays of strength t enjoy better space-filling properties than comparable orthogonal arrays of strength t in all dimensions lower than t. Given a single orthogonal array, many strong orthogonal arrays can be generated using the method of He and Tang (2013). We examine the selection of better strong orthogonal arrays using the maximin distance, which is a criterion attempting to place points in a design region so that no two points are too close. In this project, we focus on maximin strong orthogonal arrays of strength three. For small designs, we apply the method of complete search. For large designs, the complete search is infeasible and we propose a search algorithm, which greatly reduces the computation time compared with the complete search approach. The performance of the algorithm is examined and it is found that the algorithm performs almost as well as the complete search.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Boxin Tang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Optimal Designs of Two-Level Factorials when N ≡ 1 and 2 (mod 4) under a Baseline Parameterization

Date created: 
2014-08-18
Abstract: 

This work considers two-level factorial designs under a baseline parameterization where the two levels are denoted by 0 and 1. Orthogonal parameterization is commonly used in two-level factorial designs. But in some cases the baseline parameterization is natural. When only main effects are of interest, such designs are equivalent to biased spring balance weighing designs. Commonly, we assume that the interactions are negligible, but if this is not the case then these non-negligible interactions will bias the main effect estimates. We review the minimum aberration criterion under the baseline parameterization, which is to be used to compare the sizes of the bias among different designs.We define a design as optimal if it has the minimum bias among most efficient designs. Optimal designs for N ≡ 0 (mod 4), where N is the run size, were discussed by Mukerjee & Tang (2011). We continue this line of study by investigating optimal designs for the cases N ≡ 1 and 2 (mod 4). Searching for an optimal design among all possible designs is computationally very expensive, except for small N and m, where m is the number of factors. Cheng’s (2014) results are used to narrow down the search domain. We have done a complete search for small N and m. We have found that one can directly use Cheng’s (2014) theorem to find an optimal design for the case N ≡ 1 (mod 4). But for the case N ≡ 2 (mod 4), a small modification is required.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Dr. Boxin Tang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Multiple-Decrement Compositional Forecasting with the Lee-Carter Model

Author: 
Date created: 
2014-07-10
Abstract: 

Changes in cause of death patterns have a great impact on health and social care costs paid by government and insurance companies. Unfortunately an overwhelming majority of methods for mortality projections is based on overall mortality with only very few studies focusing on forecasting cause-specific mortality. In this project, our aim is to forecast cause-specific death density with a coherent model. Since cause-specific death density obeys a unit sum constraint, it can be considered as compositional data. The most popular overall mortality forecasting model, Lee-Carter model, is applied on compositional cause-specific death density. The predicted cause-specific death density is used to calculate life insurance and accidental death rider.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Gary Parker
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Bayesian Computational Methods and Applications

Author: 
Date created: 
2014-04-24
Abstract: 

The purpose of this thesis is to develop Bayesian methodology together with the proper computational tools to address two different problems. The first problem which is more general from a methodological point of view appears in computer experiments. We consider emulation of realizations of a monotone function at a finite set of inputs available from a computationally intensive simulator. We develop a Bayesian method for incorporating the monotonicity information in Gaussian process models that are traditionally used as emulators. The resulting posterior in the monotone emulation setting is difficult to sample from due to the restrictions caused by the monotonicity constraint. To overcome the difficulties faced in sampling from the constrained posterior was the motivation for development of a variant of sequential Monte Carlo samplers that are introduced in the beginning of this thesis. Our proposed algorithm that can be used in a variety of frameworks is based on imposition of the constraint in a sequential manner. We demonstrate the applicability of the sampler to different cases by two examples; one in inference for differential equation models and the second in approximate Bayesian computation. The second focus of the thesis is on an application in the area of particle physics. The statistical procedures used in the search for a new particle are investigated and a Bayesian alternative method is proposed that can address decision making and inference for a class of problems in this area. The sampling algorithm and components of the model used for this application are related to methods used in the first part of the thesis.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Richard Lockhart
Derek Bingham, Hugh Chipman, David Campbell
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Irregularly Spaced Time Series Data with Time Scale Measurement Error

Date created: 
2014-05-23
Abstract: 

This project can be mainly divided into two sections. In the first section it attempts to model an irregularly spaced time series data where time scale is being measured with a measurement error. Modelling an irregularly spaced time series data alone is quite challenging as traditional time series techniques only capture equally/regularly spaced time series data. In addition to that, the measurement error in the time scale make it even more challenging to incorporate measurement error models and functional approaches to model the time series. Thus, this project is based on a Bayesian approach to model a flexible regression function when the time scale is being measured with a measurement error. The regression functions are modelled with regression P-splines and the exploration of posterior is carried out using a fully Bayesian method that uses Markov chain monte carlo (MCMC) techniques. In section two, we identify the relationship/dependency between two irregularly spaced time series data sets which were modelled using regression P-splines and a fully Bayesian method, using windowed moving correlations. The validity of the suggested methodology is then explored using two simulations. It is then applied on two irregularly spaced time series data sets each subjected to measurement errors in time scale to identify the dependency between them in terms of statistically significant correlations.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Dave Campbell
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.