Research on human action classification has made significant progresses in the past few years. Most deep learning methods focus on improving performance by adding more network components. We propose, however, to better utilize auxiliary mechanisms, including hierarchical classification, network pruning, and skeleton-based preprocessing, to boost the model robustness and performance. We test the effectiveness of our method on five commonly used testing datasets: NTU RGB+D 60, NTU RGB+D 120, Northwestern-UCLA Multiview Action 3D, UTD Multimodal Human Action Dataset, and Kinetics 400, which is a challenging and different dataset among the others. Our experiments show that our method can achieve either comparable or better performance on all the first four datasets. In particular, our method sets up a new baseline for NTU 120, the largest dataset among the first four. We also analyze our method with extensive comparisons and ablation studies.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Yin, KangKang
Member of collection