Predicting ovarian cancer survival times: Feature selection and performance of parametric, semi-parametric, and random survival forest methods

Date created: 
Censored data
Survival analysis
Feature selection

Survival time predictions have far-reaching implications. For example, such predictions can be influential in constructing a personalized treatment plan that is of benefit to both physicians and patients. Advantages include planning the best course of treatment considering the allocation of health care services and resources, as well as the patient's overall health or personal wishes. Predictions also play an important role in providing realistic expectations and subsequently managing quality of life for the patient's residual lifetime. Unfortunately, survival data can be highly variable, making precise predictions difficult or impossible. This project explores methods of predicting time to death for ovarian cancer patients. The dataset consists of a multitude of predictors, including some that may be unimportant. The performances of various prediction methods that allow for feature selection (the Weibull model, Cox proportional hazards model, and the random survival forest) are evaluated. Prediction errors are assessed using Harrell's concordance index and a version of the expected integrated Brier score.We find that the Weibull and Cox models provide the best predictions of survival distributions in this context. Moreover, we are able to identify subsets of predictors that lead to reduced prediction error and are clinically meaningful.

Document type: 
Graduating extended essay / Research project
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Rachel Altman
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.