Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Bayesian methodology for latent function modeling in applied physics and engineering

Author: 
Date created: 
2017-12-20
Abstract: 

Computer simulators play a key role in modern science and engineering as a tool for understanding and exploring physical systems. Calibration and validation are important parts of the use of simulators. Calibration is a necessary part of assessing the predictive capability of the model with fully quantified sources of uncertainty. Field observations for physical systems often have diverse types. New methodology for calibration with generalized measurement error structure is proposed and applied to the parallel deterministic transport model for the Center for Exascale Radiation Transport at Texas A\&M University. Validation of computer models is critical for building trust in a simulator. We propose a new methodology for model validation using goodness-of-fit hypothesis tests in a Bayesian model assessment framework. Lastly, the use of a hidden Markov model with a particle filter is proposed for detection of anomalies in time series for the purpose of identifying intrusions in cyber-physical networks.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Derek Bingham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Ranking and prediction for Cycling Canada

Author: 
Date created: 
2017-12-14
Abstract: 

In efforts to improve Canadian performance in the men's Elite UCI Mountain Bike World Cup, researchers from the Canadian Sport Institute Ontario (CSIO) presented to us a specific problem. They had a wealth of race data but were unsure how to best extract insights from the dataset. We responded to their request by building an interactive user interface with R Shiny to obtain rider rankings. Estimation was carried out via maximum likelihood using the Bradley-Terry model. We improved on the existing literature, proposed an exponentially weighted version of the model, and determined an optimal weighting parameter through cross-validation involving performance of future races. Therefore, the proposed methods provide forecasting capability. The tuned Bradley-Terry estimation performed better than the UCI point-based ranking in terms of predictive error. This implementation of the Bradley-Terry model with a user-friendly graphical interface provides broader scientific audiences easy access to Bradley-Terry ranking for prediction in racing sports.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Bayesian Integration for Assessing the Quality of the Laplace Approximation

Author: 
Date created: 
2017-11-24
Abstract: 

Nuisance parameters increase in number with additional data collected. In dynamic models, this typically results in more parameters than observations making direct estimation intractable. The Laplace Approximation is the standard tool for approximating the high dimensional integral required to marginalize over the nuisance parameters. However the Laplace Approximation relies on asymptotic arguments that are unobtainable for nuisance parameters. The way to assess the quality of the Laplace Approximation relies on much slower MCMC based methods. In this work, a probabilistic integration approach is used to develop a diagnostic for the quality of the Laplace Approximation.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
David Alexander Campbell
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Forecasting Batting Averages in MLB

Author: 
Date created: 
2017-11-14
Abstract: 

We consider new baseball data from Statcast which includes launch angle, launch velocity, and hit distance for batted balls in Major League Baseball during the 2015, and 2016 seasons. Using logistic regression, we train two models on 2015 data to get the probability that a player will get a hit on each of their 2015 at-bats. For each player we sum these predictions and divide by their total at bats to predict their 2016 batting average. We then use linear regression, which expresses 2016 actual batting averages as a linear combination of 2016 Statcast predictions and 2016 PECOTA predictions. When using this procedure to obtain 2017 predictions, we find that the combined prediction performs better than PECOTA. This information may be used to make better predictions of batting averages for future seasons.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Timothy Swartz
Jason Loeppky
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Estimating conditional intensity conditional function of a neural spike train by particle Markov chain Monte Carlo and smoothing

Author: 
Date created: 
2017-08-14
Abstract: 

Understanding neural activities is fundamental and challenging in decoding how the brain processes information. An essential part of the problem is to define a meaningful and quantitative characterization of neural activities when they are represented by a sequence of action potentials or a neural spike train. The thesis approaches to use a point process to represent a neural spike train, and such representation provides a conditional intensity function (CIF) to describe neural activities. The estimation procedure for CIF, including particle Markov Chain Monte Carlo (PMCMC) and smoothing, is introduced and applied to a real data set. From the CIF and its derivative of a neural spike train, we can successfully observe adaption behavior. Simulation study verifies that the estimation procedure provides reliable estimate of CIF. This framework provides a definite quantification of neural activities and facilitates further investigation of understanding the brain from neurological perspective.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Adjusting for Scorekeeper Bias in NBA Box Scores

Date created: 
2017-06-01
Abstract: 

Box score statistics in the National Basketball Association are used to measure and evaluate player performance. Some of these statistics are subjective in nature and since box score statistics are recorded by scorekeepers hired by the home team for each game, there exists potential for inconsistency and bias. These inconsistencies can have far reaching consequences, particularly with the rise in popularity of daily fantasy sports. Using box score data, we estimate models able to quantify both the bias and the generosity of each scorekeeper for two of the most subjective statistics: assists and blocks. We then use optical player tracking data for the 2015-2016 season to improve the assist model by including other contextual spatio-temporal variables such as time of possession, player locations, and distance traveled. From this model, we present results measuring the impact of the scorekeeper and of the other contextual variables on the probability of a pass being recorded as an assist. Results for adjusting season assist totals to remove scorekeeper influence are also presented.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Luke Bornn
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Differences in Prescription Drug use Among 5-year Survivors of Childhood, Adolescent, and Young Adult Cancer and the General Population in British Columbia, Canada

Author: 
Date created: 
2017-07-13
Abstract: 

In this project, we analyze the prescription drug use of childhood, adolescent, and young adult cancer survivors identified by the CAYACS program in BC. Understanding the patterns of prescription use and factors associated with the tendency to be on prescriptions is important to policy and health care planners. Since data on actual prescription usage are not available, we use prescription dispensing data as a proxy. We examine the differences in prescription use between survivors and matched controls selected from the general population, and assess the impact of age and other clinical and sociodemographic factors on prescription use. Specifically, we model subjects' on-/off-prescription status by a first-order Markov transition model, and handle the between-subject heterogeneity using a random effect. Our method captures the differences in prescription drug use between survivors and the general population, as well as differences within the survivor population. Our results show that survivors tend to exhibit a higher probability of going on prescriptions compared to the general population over the course of their lifetime. Further, females appear to have a higher probability of going on prescriptions than males over the course of their lifetime. A simulation study is conducted to assess the performance of the estimators of the model.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Distributions of Time to First Spot Fire

Author: 
Date created: 
2017-08-15
Abstract: 

In wildfire management, a spot fire is the result of an airborne ember igniting a separate fire away from the main wildfire. Under certain environmental and wildfire conditions, a burning ember can breach a fuel break, such as a river or road, and result in the production of a spot fire. This project derives distributions of the time to the first spot fire in various situations, and verifies them by simulation. To demonstrate the implementation of the distributions in practice, we incorporate a stochastic fire spread model. This research assesses the likelihood of spot fire occurring passed a fuel break, all while taking into account both spotting distance and spotting rate. This contrasts with the traditional approach that solely involves the maximal spotting distance, and can be a tool for fire management.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Joan Hu
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Mendelian randomization for causal inference of the relationship between obesity and 28-day survival following septic shock

Date created: 
2017-08-10
Abstract: 

Septic shock is a leading cause of death in intensive care units. Septic shock occurs when a body-wide infection leads to low blood pressure, and ultimately organ failure. Some recent studies suggest that overweight and obese patients have a better chance of survival following septic shock than normal or underweight patients. In this project we apply Mendelian randomization to assess whether the observed obesity effect on 28-day survival following septic shock is causal or more likely due to unmeasured confounding variables. Mendelian randomization is an instrumental variables approach that uses genetic markers as instruments. Under modelling assumptions, unconfounded estimates of the obesity effect can be obtained by fitting a model for 28-day survival that includes a residual obesity term. Data for the project comes from the Vasopressin and Septic Shock Trial (VASST). Our analysis suggests that the observed obesity effect on survival following septic shock is not causal.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Brad McNeney
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

A Multi-Dimensional Bühlmann Credibility Approach to Modeling Multi-Population Mortality Rates

Author: 
Date created: 
2017-06-08
Abstract: 

In this project, we first propose a multi-dimensional Bühlmann credibility approach to forecasting mortality rates for multiple populations, and then compare forecasting performances among the proposed approach and the joint-k/co-integrated/augmented common factor Lee-Carter models. The model is applied to mortality data of the Human Mortality Database for both genders of three well-developed countries with an age span and a wide range of fitting year spans. Empirical illustrations show that the proposed multi-dimensional Bühlmann credibility approach contributes to more accurate forecast results, measured by MAPE (mean absolute percentage error), than those based on the Lee-Carter model.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Cary Chi-Liang Tsai
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.