In recent years, meta-learning has gained significant attention in recommending machine learning algorithms. These recommendations rely on meta-features that are used to quantify the characteristics of input datasets. However, existing meta-features are predominantly designed for classification tasks, leaving a gap in their potential use for regression analysis. This paper aims to address this gap by identifying seven data properties that might be crucial in differentiating regression algorithms and by proposing a set of meta-features designed to capture these properties. To evaluate the efficacy of these meta-features, we conduct a simulation study that investigates their ability to reflect the desired data properties. The simulation study systematically manipulates key factors, including data linearity, true error variance, the proportion of relevant explanatory variables, and the signal-to-noise ratio, among others. This enables us to examine how the meta-features respond to changes in the targeted data properties and whether they exhibit sensitivity or specificity toward those specific properties. By analyzing the correlation between the identified data properties and their corresponding meta-features, along with considering the computational time involved, we demonstrate that many of these measures exhibit strong discriminative power without imposing excessive computational complexity.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Loughin, Thomas
Member of collection