Incorporating statistical clustering methods into mortality models to improve forecasting performances

Date created: 
Lee-Carter model
CBD model
Hierarchical clustering
K-means clustering
Gaussian mixture model clustering
Bayesian model selection

Statistical clustering is a procedure of classifying a set of objects such that objects in the same class (called cluster) are more homogeneous, with respect to some features or characteristics, to each other than to those in other classes. In this project, we apply four clustering approaches to improving forecasting performances of the Lee-Carter and CBD models. First, each of four clustering methods (the Ward's hierarchical clustering, the divisive hierarchical clustering, the K-means clustering, and the Gaussian mixture model clustering) are adopted to determine, based on some characteristics of mortality rates, the number and members of age subgroups from a whole group of ages 25-84. Next, we forecast 10-year and 20-year mortality rates for each of the age subgroups using the Lee-Carter and CBD models, respectively. Finally, numerical illustrations are given with R packages "NbClust" and "mclust" for clustering. Mortality data for both genders of the US and the UK are obtained from the Human Mortality Database, and the MAPE (mean absolute percentage error) measure is adopted to evaluate forecasting performance. Comparisons of MAPE values are made with and without clustering, which demonstrate that all the proposed clustering methods can improve forecasting performances of the Lee-Carter and CBD models.

Document type: 
Graduating extended essay / Research project
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Cary Chi-Liang Tsai
Science: Department of Statistics and Actuarial Science
Thesis type: 
(Project) M.Sc.