Learning structured models for human actions and poses

Author: Wang, Yang
A grand challenge of computer vision is to enable machines to ``see people''. A solution to this challenge will enable numerous applications in various fields, e.g., security, surveillance, entertainment, human computer interaction, bio-mechanics, etc. This dissertation focus on two problems in the general area of ``looking at people'', human pose estimation and human action recognition. Th e first problem is to identify the body parts of a person from a still image. The second problem is to recognize the actions of the person from a video sequence. We formulate the solutions to these problems as learning structured models. In particular, we propose models and algorithms to address the following structures: (1) human pose estimation as structured output problem. We propose a boosted multiple tree model for modeling the spatial and occlusion constraints between human body parts; (2) temporal structure in human action recognition. We present two models based on the ``bag-of-words'' representation to capture the temporal structures of video sequences; (3) human action recognition as classification with hidden structures. We develop a model based on the hidden conditional random field to recognize human actions. We also propose a max-margin learning method for training the model. The learning method is general enough to be applied in many other applications in com puter vision, even other areas in computer science.
