Skip to main content

Pedotransfer functions: Improving predictions through machine learning and nonlinear least squares approaches coupled with quantile regression

Resource type
Thesis type
(Thesis) M.Sc.
Date created
Digital soil mapping requires input data, which often are sourced from legacy soil datasets. These datasets may be incomplete and require the use of pedotransfer functions (PTFs) to estimate the missing soil attribute values. Two methods of increasing the accuracy of PTFs are explored: the use of nonlinear least squares (NLS) to recalibrate existing equation-based functions; and the machine learner Random Forest (RF) to develop new PTFs. The target attribute used as a case study was bulk density (BD), which is a soil variable often missing in legacy soil datasets. To test the effectiveness of the NLS method in recalibrating existing PTFs, 73 PTFs from literature were tested on three regional datasets, two from British Columbia (BC) and one from Ontario. Improvement in accuracy was gauged through the comparison of root mean square error (RMSE) and concordance correlation coefficient (CCC) values determined before and after recalibration. Results showed that the accuracy of almost every PTF improved; PTFs with fewer variables and those recalibrated on the largest dataset showed the highest accuracy. The machine learner RF was also used to develop PTFs. Eleven variables were available in the legacy dataset from BC used as a case study region, and all possible combinations of these variables were used to create 512 models for predicting BD. After testing the models, they were ranked based on their CCC value, and showed a range of 0.92 for the best performing model, to 0.51 for the lowest ranked model. The number of horizons which could be estimated by each model also varied, as many of the variables were limited in their availability. To estimate missing BD values in the dataset, models were chosen on their performance and number of horizons which could be estimated, with 27 models used to estimate the missing BD values. Lastly, as most developed PTFs lack accompanying uncertainty estimates, quantile regression (QR) was used to address this gap. PTF uncertainty was shown to be related to the size of the training dataset used as well as the input variables. A framework that coupled a quantile regression approach both with PTF recalibration and with PTF development was constructed that produced region specific PTFs along with uncertainty estimates; the predictions were used to fill legacy soil datasets.
160 pages.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Schmidt, Margaret
Thesis advisor: Heung, Brandon
Member of collection
Download file Size
etd22834.pdf 8.06 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 0