Statistics and Actuarial Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Joint modeling of longitudinal and time-to-event data with the application to kidney transplant data

Author: 
Date created: 
2018-12-13
Abstract: 

The main thesis develops the novel and powerful statistical methodology to solve the problems in kidney transplant. Firstly, we use functional principal component analysis (FPCA) through conditional expectation to explore major sources of variations of GFR curves. The estimated FPC scores can be used to cluster GFR curves. Ordering FPC scores can detect abnormal GFR curves. FPCA can effectively estimate missing GFR values and predict GFR values. Secondly, we propose new joint models with mixed-effect and Accelerated Failure Time (AFT) submodels, where the piecewise linear function is used to calculate the non-proportional dynamic hazard ratio curve of a time-dependent side event. The finite sample performance of the proposed method is investigated in simulation studies. Our method is demonstrated by fitting the joint model for some clinical kidney data. Thirdly, we develop a joint model with FPCA and multi-state model to fit the longitudinal and multiple time-to event outcomes together. FPCA is efficient in reducing the dimensions of the longitudinal trajectories. Multistate submodel can be used to describe the dynamic process of multiple time-to-event outcomes. The relationships between the longitudinal and time-to-event outcomes can be assessed based on the shared latent feathers. The latent variables FPC scores are significantly related to time-to-event outcomes in the application example, and Cox model may cause bias for multiple time-to event outcomes compared with multi-state model. Fourthly, we develop a flexible class joint model of generalized linear latent variables for multivariate responses, which has an underlying Gaussian latent processes. The model accommodates any mixture of outcomes from the exponential family. Monte Carlo EM is proposed for parameter estimation and the variance components of the latent processes. We demonstrate this methodology by kidney transplant studies. Finally, in many social and health studies, measurement of some covariates are only available from units of subjects, rather than from individual. Such kind of measures are referred as to aggregate average exposures. The current method fails to evaluate high-order or nonlinear effect of aggregated exposures. Therefore, we develop a nonparametric method based on local linear fitting to overcome the difficulty. We demonstrate this methodology by kidney transplant studies.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Jiguo Cao
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Multidimensional scaling for phylogenetics

Author: 
Date created: 
2019-04-11
Abstract: 

We study a novel approach to determine the phylogenetic tree based on multidimensional scaling and Euclidean Steiner minimum tree. Pairwise sequence alignment method is implemented to align the objects such as DNA sequences and then some evolutionary models are applied to get the estimated distance matrix. Given the distance matrix, multidimensional scaling is widely used to reconstruct the map which has coordinates of the data points in a lower-dimensional space while preserves the distance. We employ both Classical multidimensional scaling and Bayesian multidimensional scaling on the distance matrix to obtain the coordinates of the objects. Based on the coordinates, the Euclidean Steiner minimum tree could be constructed and served as a candidate for the phylogenetic tree. The result from the simulation study indicates that the use of the Euclidean Steiner minimum tree as a phylogenetic tree is feasible.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Predicting ovarian cancer survival times: Feature selection and performance of parametric, semi-parametric, and random survival forest methods

Author: 
Date created: 
2019-04-23
Abstract: 

Survival time predictions have far-reaching implications. For example, such predictions can be influential in constructing a personalized treatment plan that is of benefit to both physicians and patients. Advantages include planning the best course of treatment considering the allocation of health care services and resources, as well as the patient's overall health or personal wishes. Predictions also play an important role in providing realistic expectations and subsequently managing quality of life for the patient's residual lifetime. Unfortunately, survival data can be highly variable, making precise predictions difficult or impossible. This project explores methods of predicting time to death for ovarian cancer patients. The dataset consists of a multitude of predictors, including some that may be unimportant. The performances of various prediction methods that allow for feature selection (the Weibull model, Cox proportional hazards model, and the random survival forest) are evaluated. Prediction errors are assessed using Harrell's concordance index and a version of the expected integrated Brier score.We find that the Weibull and Cox models provide the best predictions of survival distributions in this context. Moreover, we are able to identify subsets of predictors that lead to reduced prediction error and are clinically meaningful.

Document type: 
Graduating extended essay / Research project
Supervisor(s): 
Rachel Altman
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

An efficient statistical method of detecting introgressive events from big genomic data

Author: 
Date created: 
2019-04-09
Abstract: 

Introgressive hybridization, also called introgression, is the gene flow from one species to another due to mating between species. The genetic signals of introgression are not always obviously observed. Current methods of detecting introgressive events rely on the analysis of orthologous markers, and therefore do not consider gene duplication and gene loss. Since introgression leaves a phylogenetic signal similar to horizontal gene transfer, introgression events can be detected under a gene tree-species tree reconciliation framework, which simultaneously accounts for evolutionary mechanisms including gene duplication, gene loss, and gene transfer. In this work, the reconciliation-based method has been applied to a large dataset of Anopheles mosquito genomes. We recover extensive introgression that occurs in gambiae complex, a group of African mosquitoes, although with some variations compared to previous reports. Our analysis results also imply a possible ancient introgression between the Asian and African mosquitoes.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Liangliang Wang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Fast emulation and calibration of large computer experiments with multivariate output

Author: 
Date created: 
2019-04-17
Abstract: 

Scientific investigations are often expensive and the ability to quickly perform analysis of data on-location at experimental facilities can save valuable resources. Further, computer models that leverage scientific knowledge can be used to gain insight into complex processes and reduce the need for costly physical experiments, but in turn may be computationally expensive to run. We compare multiple statistical surrogates or emulators based on Gaussian processes for expensive computer models, with the goal of producing predictions quickly given large training sets. We then present a modularised approach for finding the values of inputs that allow for the surrogate model to match reality, or field observations. This process is model calibration. We then extend the emulator of choice and calibration procedure for use with multivariate response and demonstrate the speed and efficacy of such emulators on datasets from a series of transmission impact experiments.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Derek Bingham
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Rao-Blackwellizing field-goal percentage

Date created: 
2019-03-29
Abstract: 

Shooting skill in the NBA is typically measured by field goal percentage (FG%) - the number of makes out of the total number of shots. Even more advanced metrics like true shooting percentage are calculated by counting each player’s 2-point, 3-point, and free throw makes and misses, ignoring the spatiotemporal data now available (Kubatko et al. 2007). In this paper we aim to better characterize player shooting skill by introducing a new estimator based on post-shot release shot-make probabilities. Via the Rao-Blackwell theorem, we propose a shot-make probability model that conditions probability estimates on shot trajectory information, thereby reducing the variance of the new estimator relative to standard FG%. We obtain shooting information by using optical tracking data to estimate three factors for each shot: entry angle, shot depth, and left-right accuracy. Next, we use these factors to model shot-make probabilities for all shots in the 2014-15 season, and use these probabilities to produce a Rao-Blackwellized FG% estimator (RB-FG%) for each player. We present a variety of results derived from this shot trajectory data, as well as demonstrate that RB-FG% is better than raw FG% at predicting 3-point shooting and true-shooting percentages. Overall, we find that conditioning shot-make probabilities on spatial trajectory information stabilizes inference of FG%, creating the potential to estimate shooting statistics and related metrics earlier in a season than was previously possible.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Luke Bornn
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Unsupervised learning on functional data with an application to the analysis of U.S. temperature prediction accuracy

Author: 
Date created: 
2019-02-07
Abstract: 

Unsupervised learning techniques are widely applied in exploratory analysis as the motivation of further analysis. In functional data analysis, two typical topics of unsupervised learning are functional principal component analysis and functional data clustering analysis. In this study, besides reviewing the developed unsupervised learning techniques, we extend unsupervised random forest clustering method to functional data and detect its shortages and strength through comparisons with other clustering methods in simulation studies. Finally, both proposed method and developed unsupervised learning techniques are conducted on a real data application: the analysis of the accuracy of the U.S. temperature prediction from 2014 to 2017.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.

Development of functional principal components analysis and estimating the time-varying gene regulation network

Author: 
Date created: 
2018-09-27
Abstract: 

Functional data analysis (FDA) addresses the analysis of information on curves or functions. Examples of such curves or functions include time-course gene expression measurements, the Electroencephalography (EEG) data motoring the brain activity, the emission rate of automobiles after acceleration and the growth curve of children on body fat percentage made over a growth time period. The primary interests for the underlying curves or functions varies in different fields. In this thesis, new methodology for constructing time-varying net- work based on functional observations is proposed. Several variations of Functional Principal Component Analysis (FPCA) are developed in the context of functional regression model. Lastly, the new use of FPCA are explored in terms of recovering trajectory functions and estimating derivatives.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Jiguo Cao
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Construction of orthogonal designs and baseline designs

Date created: 
2018-07-23
Abstract: 

In this thesis, we study the construction of designs for computer experiments and for screening experiments. We consider the existence and construction of orthogonal designs, which are a useful class of designs for computer experiments. We first establish a non-existence result on orthogonal designs, generalizing an early result on orthogonal Latin hypercubes, and then present some construction results. By computer search, we obtain a collection of orthogonal designs with small run sizes. Using these results and existing methods in the literature, we create a comprehensive catalogue of orthogonal designs for up to 100 runs. In the rest of the thesis, we study designs for screening experiments. We propose two classes of compromise designs for estimation of main effects using two-level fractional factorial designs under baseline parameterization. Previous work in the area indicates that orthogonal arrays are more efficient than one-factor-at-a-time designs whereas the latter are better than the former in terms of minimizing the bias due to non-negligible interactions. Using efficiency criteria, we examine a class of compromise designs, which are obtained by adding runs to one-factor-at-a-time designs. A theoretical result is established for the case of adding one run. For adding two or more runs, we develop a complete search algorithm for finding optimal compromise designs. We also investigate another class of compromise designs, which are constructed from orthogonal arrays by changing some ones to zeros in design matrices. We then use a method of complete search for small run sizes to obtain optimal compromise designs. When the complete search is not feasible, we propose an efficient, though incomplete, search algorithm.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Boxin Tang
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.

Quantifying inter-generational equity under different target benefit plan designs

Author: 
Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2018-06-20
Abstract: 

In this research, we investigate the value of inter-generational transfers under various target benefit plan designs. The contingent retirement benefits are decomposed into embedded options, and the risk-adjusted values of these options are calculated and compared across generations. For this purpose, an economic scenario generator is implemented: the economic variables’ dynamics are generated by a model that combines the first-order vector autoregressive model and the generalized autoregressive conditional heteroscedasticity process. A corresponding risk-neutral model is derived and estimated using the prices of financial assets; the latter is helpful to price the embedded options. We study four target benefit plans with different design elements. We find that intergenerational value transfers arise by simply joining the collective pension scheme even without the inclusion of any intertemporal benefit smoothing designs. Without additional sourceof funding, we show that benefit security and stability can be achieved by adopting plan designs that allow temporary inter-generational subsidization, e.g., plan designs with no-action range. We show that adding a symmetric no-action range can reduce the volatility of retirement benefits without triggering significant value transfers, at least under the assumption of stationary demographic profile and when the simulation of economic scenarios starts from its long-term equilibrium level.

Document type: 
Graduating extended essay / Research project
File(s): 
Supervisor(s): 
Barbara Sanders
Jean-François Bégin
Department: 
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.