Object detection is to find and localize objects of a specific class in images or videos. This task is the foundation of image and video understanding, thus it becomes one of the most popular topics in the area of computer vision and pattern recognition. Object detection is not only essential for the study of computer vision, pattern recognition and image processing, but also valuable in the applications of public safety, entertainment and business. In this research, we aim to solve this problem in two focused areas: the local feature design, and the boosting learning. Our research on local features could be summarized into a hierarchical structure with 3 levels. The features in different levels capture different object characteristic information. In the lower level, we investigate how to design effective binary features, which perform quite well for the object categories with small intra-class variations. In the middle level, we consider integrating the gradient information and structural information together. This results in more discriminative gradient features. In the higher level, we discuss how to construct the co-occurrence features. Using such features, we may get a classifier with high accuracy for general object detection.After the feature extraction, boosted classifiers are learned for the final decision. We work on two aspects to improve the effectiveness of boosting learning. Firstly, we improve the discriminative ability of the weak classifiers by the proposed basis mapping. We show that learning in the mapped space is more effective compared to learning in the original space. In addition, we explore the efficiency-accuracy trade-off problem in boosting learning. The Generalization and Efficiency Balance (GEB) framework, and the hierarchical weak classifier are designed for this target. As a result, the resulting boosted classifiers not only achieve high accuracy, but also have good generalization and efficiency. The performance of the proposed local features and boosting algorithms are evaluated using the benchmark datasets of faces, pedestrians, and general objects. The experimental results show that our work achieves better accuracy compared to the methods using traditional features and machine learning algorithms.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Li, Ze-Nian
Member of collection