Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Statistical modelling of temporary stream flow in Canadian prairie provinces

Author: 
Date created: 
2012-08-20
Abstract: 

Accurate forecasting of stream flow is of vital importance in semi-arid regions in order to meet the needs of humans, such as agriculture, and for wildlife. It is also of considerable interest for predicting stream flow for ungauged basins and for detecting change due to landuse or climate variations. Daily streamflows in semi-arid and arid regions are characterized by zero-inflation, seasonality, autoregression and extreme events such as floods and droughts. Analyses at the level of daily data for intermittent streams are problematic because of the preponderance of zero flows. Basic modelling approaches are often inappropriate when many zero flow events are present; approaches need to be modified to allow greater flexibility in incorporating zeros than is possible with traditional methods. This project discusses the utility of spline compartment models for analysis of data from intermittent streams, whereby the log-odds of the probability of a non-zero flow day, as well as the logarithm of non-zero flow rate can be studied. These models permit handling of large numbers of zero-flow days; the use of splines and other smoothers have the benefi t that they permit a wide range of distributional shapes to be fitted. The models are illustrated for ten streams in the Canadian Prairie Provinces.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Charmaine Dean
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Investigating the use of the accelerated hazards model for survival analysis

Author: 
Date created: 
2010-12-09
Abstract: 

This project contrasts the Proportional Hazards, Accelerated Failure Time and Accelerated Hazards (AH) models in the analysis of time to event data. The AH model handles data that exhibit crossing of the survival and hazard curves, unlike the other two models considered. The three models are illustrated on five contrasting data sets. A simulation study is conducted to assess the small sample performance of the AH model by quantifying the mean squared error of the predicted survivor curves under scenarios of crossing and non-crossing survivor curves. The results show that the AH model can perform poorly under model misspecification for models with a crossing hazard. Problems with variance estimation of parameters in the AH model are observed for small sample sizes and a bootstrap approach is offered as an alternate method of quantifying precision of estimates.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Charmaine Dean
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Some perspectives of smooth and locally sparse estimators

Author: 
Date created: 
2013-07-15
Abstract: 

In this thesis we develop some new techniques for computing smooth and meanwhile locally sparse (i.e. zero on some sub-regions) estimators of functional principal components (FPCs) in functional principal component analysis (FPCA) and coefficient functions in functional linear regression (FLR). Like sparse models in ordinary data analysis, locally sparse estimators in functional data analysis enjoy less variability and better interpretability. In the first part of the thesis, we develop smooth and locally sparse estimators of FPCs. For an FPC, the sub-regions on which it has significant magnitude are interpreted as where sample curves have major variations. The non-null sub-regions of our estimated FPCs coincide with the sub-regions where the corresponding FPC has significant magnitude. This makes our derived FPCs easier to interpret: those non-null sub-regions are where sample curves have major variations. An efficient algorithm is designed to compute our estimators using projection deflation. Our estimators are strongly consistent and asymptotically normal under mild conditions. Simulation studies also show that FPCs estimated by our method explain similar variations of sample curves as FPCs estimated from other methods. In the second part of the thesis, we develop a new regularization technique called “functional SCAD” (fSCAD), which is the functional generalization of the well-known SCAD (smoothly clipped absolute deviation) regularization, and then apply it to derive a smooth and locally sparse estimator of the coefficient function in FLR. The fSCAD enables us to identify the null sub-regions of the coefficient function without over shrinking the non-zero values. The smoothness of our estimator is regularized by a roughness penalty. We also develop an efficient algorithm to compute the estimator in practice via B-Splines expansion. An asymptotic analysis shows that our estimator enjoys the oracle property, i.e. it performs as well as if we knew the true null sub-regions of the coefficient function in advance. The simulation studies show that our estimator has superior numerical performance.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

A tutorial on the inheritance procedure for multiple testing of tree-structured hypotheses

Author: 
Date created: 
2013-07-25
Abstract: 

In a candidate gene association study the goal is to find associations between a trait of interest and genetic variation at markers, such as single-nucleotide polymorphisms, or SNPs. SNPs are grouped within candidate genes thought to influence the trait. Such grouping imposes a tree structure on the hypotheses, with hypotheses about single-SNP associations nested within gene-based associations. In this project we give a tutorial on the inheritance procedure, a powerful new method for testing tree-structured hypotheses. We define sequentially rejective procedures and show that the inheritance procedure is a sequentially rejective procedure that strongly controls the family-wise error rate under so-called monotonicity and single step conditions. We also show how to further improve power by taking advantage of the logical implications among the nested hypotheses. The resulting testing strategy enables more powerful detection of gene- and SNP-based associations, while controlling the chance of incorrectly claiming that such associations exist.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Brad Mcneney
Jinko Graham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Use of genetic algorithms for optimal investment strategies

Author: 
Date created: 
2013-04-17
Abstract: 

In this project, a genetic algorithm (GA) is used in the development of investment strategies to decide the optimum asset allocations that back up a portfolio of term insurance contracts and the re-balancing strategy to respond to the changing financial markets, such as change in interest rates and mortality experience. The objective function used as the target to be maximized in GA allows us to accommodate three objectives that should be of interest to the management in insurance companies. The three objectives under consideration are maximizing the total value of wealth at the end of the period, minimizing the variance of the total value of the wealth across the simulated interest rate scenarios and achieving consistent returns on the portfolio from year to year. One objective may be in conflict with another, and GA tries to find a solution, among the large searching space of all the solutions, that favors a particular objective as specified by the user while not worsening other objectives too much. Duration matching, a popular approach to manage risks underlying the traditional life insurance portfolios, is used as a benchmark to examine the effectiveness of the strategies obtained through the use of genetic algorithms. Experiments are conducted to compare the performance of the investment strategy proposed by the genetic algorithm to the duration matching strategy in terms of the different objectives under the testing scenarios. The results from the experiments successfully illustrate that with the help of GA, we are able to find a strategy very similar to the strategy from duration matching. We are also able to find other strategies that could outperform duration matching in terms of some of the desired objectives and are robust in the tested changing environment of interest rate and mortality.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Gary Parker
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

The Hot hand in golf

Date created: 
2013-04-26
Abstract: 

In this project, an analysis is made to try to determine whether the phenomenon known as the hot hand, exists in golf. Data from a particular golf tournament in 2012 is studied in order to try to nd out whether this proposition seems true. For this tournament, the scores for each golfer are split into the number of strokes and the number of putts required to complete the course. The key idea in this project is the substitution of the number of putts with the expected number of putts. The rationale is that putting is a highly stochastic element of golf and that the randomness conceals evidence of the hot hand. This expected value will be based on the distance to the pin once the ball is on the green. This distance to the pin is obtained from the ShotLink website. New scores for all golfers are calculated and consist of the sum of the number of strokes plus the expected number of putts in order to complete a course. The association between said scores in the rst round and similar scores in the second round is calculated. The results seem to point to the conclusion that there is no hot hand in golf.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Tim Swartz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Hockey pools for profit: A simulation based player selection strategy

Author: 
Date created: 
2005
Abstract: 

The goal of this project is to develop an optimal player selection strategy for a common playoff hockey pool. The challenge is to make the strategy applicable in real time. Most selection methods rely on the draftee's hockey knowledge. Our selection strategy was created by applying appropriate statistical models to regular season data and introducing a reasonable optimality criterion. A simulated draft is performed in order to test our selection method. The results suggest that the approach is superior to several ad-hoc strategies.

Document type: 
Thesis
File(s): 
Department: 
Department of Statistics and Actuarial Science - Simon Fraser University
Thesis type: 
Project (M.Sc.)

Methods for the analysis of spatio-temporal multi-state processes

Author: 
Date created: 
2005
Abstract: 

Studies of recurring infection or chronic disease often collect longitudinal data on the disease status of subjects. Multi-state transitional models are commonly used for describing the development of such longitudinal data. In this setting, we model a stochastic process, which at any point in time will occupy one of a discrete set of states and interest centers on the transition process between states. For example, states may refer to the number of recurrences of an event or the stage of a disease. Geographic referencing of data collected in longitudinal studies is progressively more common as scientific databases are being linked with GIs systems. This has created a need for statistical methods addressing the resulting spatial-longitudinal structure of the data. In this thesis, we develop hierarchical mixed multi-state models for the analysis of such longitudinal data when the processes corresponding to different subjects may be correlated spatially over a region. Methodological developments have been strongly driven by studies in forestry and spatial epidemiology. Motivated by an application in forest ecology studying pine weevil infestations, the second chapter develops methods for handling mixtures of populations for spatial discrete-time twestate processes. The two-state discrete-time transitional model, often used for studying chronic conditions in human populations, is extended to settings where subjects are spatially arranged. A mixed spatially correlated mover-stayer model is developed. Here, clustering of infection is modelled by a spatially correlated random effect reflecting the density or closeness of the individuals under study. Analysis is carried out using maximum likelihood with a Monte Carlo EM algorithm for implementation and also using a fully Bayesian analysis. The third chapter presents continuous-time spatial multi-state models. Here, joint modelling of both the spatial correlation as well as correlation between different transition rates is required and a multivariate spatial approach is employed. A proportional intensities frailty model is developed where baseline intensity functions are modelled using both parametric Weibull forms as well as flexible representations based on cubic B-splines. The methodology is applied to a study of invasive cardiac procedure in Quebec examining readmission and mortality rates over a four-year period. Finally, in the fourth chapter we return to the two-state discrete-time setting. An extension of the mixed mover-stayer model is motivated and developed within the Bayesian framework. Here, a multivariate conditional autoregressive (MCAR) model is incorporated providing flexible joint correlation structures. We also consider a test for the number of mixture components, quantifying the existence of a hidden subgroup of 'stayers' within the population. Posterior summarization is based on a Metropolis-Hastings sampler and methods for assessing the model goodness-of-fit are based on posterior predictive comparisons.

Document type: 
Thesis
File(s): 
Department: 
Department of Statistics and Actuarial Science - Simon Fraser University
Thesis type: 
Thesis (Ph.D.)

Confidentiality and variance estimation in complex surveys

Author: 
Date created: 
2004
Abstract: 

A variance estimator in a large survey based on jackknife or balanced repeated replication typically requires a large number of replicates and replicate weights. Reducing the number of replicates has important advantages for computation and for limiting the risk of data disclosure in public use data files. In the first part of this thesis, we propose algorithms adapted from scheduling theory to reduce the number of replicates. The algorithms are simple and efficient and can be adapted to easily account for analytic domains. An important concern with combining strata is that the resulting variance estimators may be inconsistent. We establish conditions for the consistency of the variance estimators and give bounds on attained precision of the variance estimators that are linked to the consistency conditions. The algorithms are applied to both a real sample survey and to samples from simulated populations, and the algorithms perform very well in attaining variance estimators with precision levels close to the upper bounds. Another important issue in survey sampling is the conflict of interest between information sharing and disclosure control. Statistical agencies routinely release microdata for public use with stratum and/or cluster indicators suppressed for confidentiality. For the purpose of variance estimation, pseudo-cluster indicators are sometimes produced for use in linearization methods or replication weights for use in resampling methods. If care is not taken these can be used to (partially) reconstruct the stratum and/or cluster indicators and thus inadvertently break confidentiality. In the second part of this thesis, we will demonstrate the dangers and adapt algorithms used from scheduling theory and elsewhere to attempt to reduce this danger.

Document type: 
Thesis
File(s): 
Department: 
Department of Statistics and Actuarial Science - Simon Fraser University
Thesis type: 
Thesis (Ph.D.)

Parametric changepoint survival model with application to coronary artery bypass graft surgery data

Author: 
Date created: 
2005
Abstract: 

Typical survival analyses treat the time to failure as a response and use parametric models, such as the Weibull or log-normal, or non-parametric methods, such as the Cox proportional analysis, to estimate survivor functions and investigate the effect of covariates. In some circumstances, for example where treatment is harsh, the empirical survivor curve appears segmented with steep initial descent followed by a plateau or less sharp decline. This is the case in the analysis of survival experience after coronary artery bypass surgery, the application which motivated this project. We employ a parametric Weibull changepoint model for the analysis of such data, and bootstrap procedures for estimation of standard errors. In addition, we consider the effect on the analyses of rounding of the data, with such rounding leading to large numbers of ties.

Document type: 
Thesis
File(s): 
Department: 
Department of Statistics and Actuarial Science - Simon Fraser University
Thesis type: 
Project (M.Sc.)