Statistical Inference under Latent Class Models, with Application to Risk Assessment in Cancer Survivorship Studies

Date created: 
Cross-Sectional and Longitudinal Analysis
Extended GEE Approach
Likelihood and Pseudo-Likelihood Estimation
Medical Cost
Physician Claims
Robust Variance Estimation

Motivated by a cancer survivorship program, this PhD thesis aims to develop methodology for risk assessment, classification, and prediction. We formulate the primary data collected from a cohort with two underlying categories, the at-risk and not-at-risk classes, using latent class models, and we conduct both cross-sectional and longitudinal analyses. We begin with a maximum pseudo-likelihood estimator (pseudo-MLE) as an alternative to the maximum likelihood estimator (MLE) under a mixture Poisson distribution with event counts. The pseudo-MLE utilizes supplementary information on the not-at-risk class from a different population. It reduces the computational intensity and potentially increases the estimation efficiency. To obtain statistical methods that are more robust than likelihood-based methods to distribution misspecification, we adapt the well-established generalized estimating equations (GEE) approach under the mean-variance model corresponding to the mixture Poisson distribution. The inherent computing and efficiency issues in the application of GEEs motivate two sets of extended GEEs, using the primary data supplemented by information from the second population alone or together with the available information on individuals in the cohort who are deemed to belong to the at-risk class. We derive asymptotic properties of the proposed pseudo-MLE and the estimators from the extended GEEs, and we estimate their variances by extended Huber sandwich estimators. We use simulation to examine the finite-sample properties of the estimators in terms of both efficiency and robustness. The simulation studies verify the consistency of the proposed parameter estimators and their variance estimators. They also show that the pseudo-MLE has efficiency comparable to that of the MLE, and the extended GEE estimators are robust to distribution misspecification while maintaining satisfactory efficiency. Further, we present an extension of the favourable extended GEE estimator to longitudinal settings by adjusting for within-subject correlation. The proposed methodology is illustrated with physician claims from the cancer program. We fit different latent class models for the counts and costs of the physician visits by applying the proposed estimators. We use the parameter estimates to identify the risk of subsequent and ongoing problems arising from the subjects’ initial cancer diagnoses. We perform risk classification and prediction using the fitted latent class models.

Document type: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
X. Joan Hu
John J. Spinelli
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Thesis) Ph.D.