Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Fast emulation and calibration of large computer experiments with multivariate output

Author: 
Date created: 
2019-04-17
Abstract: 

Scientific investigations are often expensive and the ability to quickly perform analysis of data on-location at experimental facilities can save valuable resources. Further, computer models that leverage scientific knowledge can be used to gain insight into complex processes and reduce the need for costly physical experiments, but in turn may be computationally expensive to run. We compare multiple statistical surrogates or emulators based on Gaussian processes for expensive computer models, with the goal of producing predictions quickly given large training sets. We then present a modularised approach for finding the values of inputs that allow for the surrogate model to match reality, or field observations. This process is model calibration. We then extend the emulator of choice and calibration procedure for use with multivariate response and demonstrate the speed and efficacy of such emulators on datasets from a series of transmission impact experiments.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Derek Bingham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Rao-Blackwellizing field-goal percentage

Date created: 
2019-03-29
Abstract: 

Shooting skill in the NBA is typically measured by field goal percentage (FG%) - the number of makes out of the total number of shots. Even more advanced metrics like true shooting percentage are calculated by counting each player’s 2-point, 3-point, and free throw makes and misses, ignoring the spatiotemporal data now available (Kubatko et al. 2007). In this paper we aim to better characterize player shooting skill by introducing a new estimator based on post-shot release shot-make probabilities. Via the Rao-Blackwell theorem, we propose a shot-make probability model that conditions probability estimates on shot trajectory information, thereby reducing the variance of the new estimator relative to standard FG%. We obtain shooting information by using optical tracking data to estimate three factors for each shot: entry angle, shot depth, and left-right accuracy. Next, we use these factors to model shot-make probabilities for all shots in the 2014-15 season, and use these probabilities to produce a Rao-Blackwellized FG% estimator (RB-FG%) for each player. We present a variety of results derived from this shot trajectory data, as well as demonstrate that RB-FG% is better than raw FG% at predicting 3-point shooting and true-shooting percentages. Overall, we find that conditioning shot-make probabilities on spatial trajectory information stabilizes inference of FG%, creating the potential to estimate shooting statistics and related metrics earlier in a season than was previously possible.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Luke Bornn
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Unsupervised learning on functional data with an application to the analysis of U.S. temperature prediction accuracy

Author: 
Date created: 
2019-02-07
Abstract: 

Unsupervised learning techniques are widely applied in exploratory analysis as the motivation of further analysis. In functional data analysis, two typical topics of unsupervised learning are functional principal component analysis and functional data clustering analysis. In this study, besides reviewing the developed unsupervised learning techniques, we extend unsupervised random forest clustering method to functional data and detect its shortages and strength through comparisons with other clustering methods in simulation studies. Finally, both proposed method and developed unsupervised learning techniques are conducted on a real data application: the analysis of the accuracy of the U.S. temperature prediction from 2014 to 2017.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Development of functional principal components analysis and estimating the time-varying gene regulation network

Author: 
Date created: 
2018-09-27
Abstract: 

Functional data analysis (FDA) addresses the analysis of information on curves or functions. Examples of such curves or functions include time-course gene expression measurements, the Electroencephalography (EEG) data motoring the brain activity, the emission rate of automobiles after acceleration and the growth curve of children on body fat percentage made over a growth time period. The primary interests for the underlying curves or functions varies in different fields. In this thesis, new methodology for constructing time-varying net- work based on functional observations is proposed. Several variations of Functional Principal Component Analysis (FPCA) are developed in the context of functional regression model. Lastly, the new use of FPCA are explored in terms of recovering trajectory functions and estimating derivatives.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Construction of orthogonal designs and baseline designs

Date created: 
2018-07-23
Abstract: 

In this thesis, we study the construction of designs for computer experiments and for screening experiments. We consider the existence and construction of orthogonal designs, which are a useful class of designs for computer experiments. We first establish a non-existence result on orthogonal designs, generalizing an early result on orthogonal Latin hypercubes, and then present some construction results. By computer search, we obtain a collection of orthogonal designs with small run sizes. Using these results and existing methods in the literature, we create a comprehensive catalogue of orthogonal designs for up to 100 runs. In the rest of the thesis, we study designs for screening experiments. We propose two classes of compromise designs for estimation of main effects using two-level fractional factorial designs under baseline parameterization. Previous work in the area indicates that orthogonal arrays are more efficient than one-factor-at-a-time designs whereas the latter are better than the former in terms of minimizing the bias due to non-negligible interactions. Using efficiency criteria, we examine a class of compromise designs, which are obtained by adding runs to one-factor-at-a-time designs. A theoretical result is established for the case of adding one run. For adding two or more runs, we develop a complete search algorithm for finding optimal compromise designs. We also investigate another class of compromise designs, which are constructed from orthogonal arrays by changing some ones to zeros in design matrices. We then use a method of complete search for small run sizes to obtain optimal compromise designs. When the complete search is not feasible, we propose an efficient, though incomplete, search algorithm.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Boxin Tang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Quantifying inter-generational equity under different target benefit plan designs

Author: 
Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2018-06-20
Abstract: 

In this research, we investigate the value of inter-generational transfers under various target benefit plan designs. The contingent retirement benefits are decomposed into embedded options, and the risk-adjusted values of these options are calculated and compared across generations. For this purpose, an economic scenario generator is implemented: the economic variables’ dynamics are generated by a model that combines the first-order vector autoregressive model and the generalized autoregressive conditional heteroscedasticity process. A corresponding risk-neutral model is derived and estimated using the prices of financial assets; the latter is helpful to price the embedded options. We study four target benefit plans with different design elements. We find that intergenerational value transfers arise by simply joining the collective pension scheme even without the inclusion of any intertemporal benefit smoothing designs. Without additional sourceof funding, we show that benefit security and stability can be achieved by adopting plan designs that allow temporary inter-generational subsidization, e.g., plan designs with no-action range. We show that adding a symmetric no-action range can reduce the volatility of retirement benefits without triggering significant value transfers, at least under the assumption of stationary demographic profile and when the simulation of economic scenarios starts from its long-term equilibrium level.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Barbara Sanders
Jean-François Bégin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Statistical inference using large administrative data on multiple event times, with application to cancer survivorship research

Author: 
Date created: 
2018-12-20
Abstract: 

Motivated by the breast cancer survivorship research program at BC Cancer Agency, this dissertation develops statistical approaches to analyzing right-censored multivariate event time data. Following the background and motivation of the research, we introduce the framework of the dissertation, and provide a literature review and a list of the research questions. A description of the motivating study data is then given together with a preliminary analysis before presenting the proposed approaches as follows. We consider firstly estimation of the joint survivor function of multiple event times when the observations are subject to informative censoring due to a terminating event. We formulate the potential dependence of the multiple event times with the time to the terminating event by the Archimedean copulas. This may account for the informative censoring and, at the same time, allow to adapt the commonly used two-step procedure for estimating the joint distribution of the multiple event times under a copula model. We propose an easy-to-implement pseudo-likelihood based estimation procedure under the model, which reduces computational intensity compared to its MLE counterpart. A more flexible approach is then proposed to handling informative censoring with particular attention to observations on bivariate event time potentially censored by a terminating event. We formulate the correlation of the bivariate event time with the censoring time by embedding the bivariate event time distribution in a bivariate copula model. This yields the convenience of inference under the conventional copula model. At the same time, the proposed model is more flexible, and thus potentially more appropriate in many practical situations than modeling the event times and the associated censoring time jointly by a single multivariate copula. Adapting the commonly used two-stage estimation procedure under a copula model, we develop an easy-to-implement estimator for the joint survivor function of the two event times. A by-product of the proposed approaches is an estimator for the marginal distribution of a single event time with semicompeting-risks data. Further, we extend the approach to regression settings to explore covariate effects in either parametric or nonparametric forms. In particular, adjusting for some covariates, we compare two populations based on an event time with observations subject to informative censoring. We conduct both asymptotic and simulation studies to examine the consistency, efficiency, and robustness of the proposed approaches. The breast cancer program that motivated this research is employed to illustrate the methodological development throughout the dissertation.

Document type: 
Thesis
File(s): 
Senior supervisor: 
X. Joan Hu
John J. Spinelli
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Masquerade detection: A topic model based approach

Date created: 
2018-12-19
Abstract: 

The goal of masquerade detection is to "detect" when an intruder has infiltrated a computer system by looking for evidence of malicious behaviour. In this project, I use a topic model based intrusion detection system to search for intruders within the SEA and Greenberg datasets of Unix computer commands. Using LDA topic modeling I was able to find a probability distribution for each user for both the topics over a block of commands and over each command. Using these two probability distributions and building on previous detection techniques I was able to create five different detection techniques. I describe how I created the five LDA based models and combine them to find a sixth model. All of these techniques performed on par with their non-LDA counter-parts. Therefore, combined with the reduction in dimensionality afforded by the LDA topic model, I conclude that my methods perform better than the current models.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Derek Bingham
David Campbell
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Partial stratification in capture-recapture experiments and integrated population modeling with radio telemetry

Date created: 
2018-12-19
Abstract: 

In this thesis, we develop and apply three new methods for ecological data sets. We present two new developments related to capture-recapture studies and one development related to integrated population modeling. In the first project, we present new methods using partial stratification in two-sample capture-recapture experiments for closed populations. Capture heterogeneity is known to cause bias in estimates of abundance in capture-recapture experiments. This heterogeneity is often related to observable fixed characteristics of the animals such as sex. If this information can be observed for each handled animal at both sample occasions, then it is straightforward to stratify (e.g. by sex) and obtain stratum-specific estimates. However in many fishery experiments it is difficult to sex all captured fish because morphological differences are slight or because of logistic constraints. In these cases, a sub-sample of the captured fish at each sample occasion is selected and additional and often more costly measurements are made, such as sex determination through sacrificing the fish. We develop new methods to estimate abundance for these types of experiments. Furthermore, we develop methods for optimal allocation of effort for a given cost. We also develop methods to account for additional information (e.g. prior information about the sex ratio) and for supplemental continuous covariates such as length. These methods are applied to a problem of estimating the size of the walleye population in Mille Lacs Lake Minnesota, USA. In the second project, we present new methods using partial stratification in k-sample (k>=2) capture-recapture experiments of a closed population with known losses on capture to estimate abundance. We present the new methods for large populations using maximum likelihood and a Bayesian method and simulated data with known losses on capture was used to illustrate the new methods. In the third project, we present an integrated population model using capture-recapture, dead recovery, snorkel, and radio telemetry surveys. We apply this model to Chinook salmon on the West Coast of Vancouver Island, Canada to estimate spawning escapement and to describe the movement from the ocean to spawning grounds considering the stopover time, stream residence time, and snorkel survey observer efficiency.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Carl Schwarz
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Cooperation in target benefit plans: A game theoretical perspective

Author: 
Date created: 
2018-12-12
Abstract: 

Many occupational pension plans rely on intergenerational cooperation to deliver stable retirement benefits; however, this cooperation has natural limits and exceeding these limits can threaten the sustainability of the plan. In this project, we cast the problem of intergenerational cooperation within funded pension plans in a game theoretic framework that incorporates overlapping generations and uncertainty in the cost of cooperation. Employing the concept of a subgame perfect equilibrium, we determine the threshold above which cooperation should not be enforced. Using two different processes for the stochastic cost of cooperation, we illustrate the combination of parameters that allow for the existence of a reasonable threshold, and study how the level of prefunding and the stochastic process parameters affect both the threshold and the probability of sanctioned non-cooperation.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Barbara Sanders
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.