Heavy label noise is often present in many practical scenarios where observed labels of instances are corrupted. Classification with heavy label noise has great significance and attracts a lot of attention, since label noise may lead to many potential negative consequences. Many state-of-the-art approaches assume that label noise is class-dependent, and thus cannot be generalized to situations without this assumption. In this thesis, we propose a Markov chain sampling framework, MCS, to conquer the limitations of the existing methods in the binary classification problem. The main idea is to utilize the predictions of a sequence of classifiers in an ensemble way to detect mislabeled instances, the sequence of classifiers is trained on different subsets of the training data by sampling the states of a carefully designed Markov chain with random walk. Our proposed MCS framework is general and can entertain a wide spectrum of classification algorithms. We theoretically prove the correctness and effectiveness of the MCS framework. We further present experimental results showing the effectiveness and efficiency of the proposed framework and derivative algorithms.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Member of collection