Motivated partly by analysis of lightning-caused wildfire data from Alberta, this dissertation develops statistical methodology for analyzing event times with missing origins aided by auxiliary information such as associated longitudinal measures and other relevant information prior to the time origin. We begin an analysis of the motivating data to estimate distribution of time to initial attack since a wildfire starts burning with flames, i.e. duration between the start time and initial attack time of a fire, with two conventional approaches: one neglects the missing origin and performs inference on the observed portion of duration and the other views the observation on the event time of interest subject to interval censoring with a pre-determined interval. The counterintuitive/non-informative results of the preliminary analysis lead us to propose new approaches to tackling the issue of missing origin. To facilitate methodological development, we first consider estimation of the duration distribution with independently and identically distributed (iid) observations. We link the unobserved time origin to the available longitudinal measures of burnt areas via the first-hitting-time model. This yields an intuitive and easy-to-implement adaption of the empirical distribution function with the event time data. We establish consistency and weak convergence of the proposed estimator and present its variance estimation. We then extend the proposed approach to studying the association of the duration time with a list of potential risk factors. A semi-parametric accelerated failure time (AFT) regression model is considered together with a Wiener process model using random drift for longitudinal measures. Further, we accommodate the potential spatial correlation of the wildfires by specifying the drift of the Wiener process as a function of covariates and spatially correlated random effects. Moreover, we propose a method to aid the duration distribution estimation with lightning data. It leads to an alternative approach to estimating the distribution of the duration by adapting the Turnbull estimator with interval-censored observations. A prominent byproduct of this approach is an estimation procedure for the distribution of ignition time using all the lightning data and the sub-sampled data. The finite-sample performance of proposed approaches is examined via simulation studies. We use the motivating Alberta wildfire data to illustrate the proposed approaches throughout the thesis. The data analyses and simulation studies show that the two conventional approaches with current data structure could give rise to misleading inference. The proposed approaches provide intuitive, easy-to-implement alternatives to analysis of event times with missing origins. We anticipate the methodology has many applications in practice, such as infectious diseases research.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Member of collection