Motivated by the data explosion in the automobile industry due to technological innovations, this report aims to provide an outline of how dimension reduction methods can be used in modelling automobile insurance claim amounts. The framework is based on a generalized linear model (GLM) with Tweedie distribution. Three popular methods are discussed in detail, the stepwise method, the principal component analysis (PCA) using the nonlinear iterative partial least squares (NIPALS) method, and the partial least squares method. The effectiveness and predictability of the three methods are compared using a car insurance data example. The results show that a small number of latent variables can effectively capture sufficient information in the explanatory variables, and can be utilized to build a decent predictive model for loss costs. Our study confirms that when multicollinearity exists in the dataset, using orthogonal latent variables can generally result in better modelling performance than ordinary variable selection methods
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Lu, Yi
Member of collection