Wang, Yang

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2009

Authors/Contributors

Author (aut): Wang, Yang

Abstract

A grand challenge of computer vision is to enable machines to ``see people''. A solution to this challenge will enable numerous applications in various fields, e.g., security, surveillance, entertainment, human computer interaction, bio-mechanics, etc. This dissertation focus on two problems in the general area of ``looking at people'', human pose estimation and human action recognition. Th e first problem is to identify the body parts of a person from a still image. The second problem is to recognize the actions of the person from a video sequence. We formulate the solutions to these problems as learning structured models. In particular, we propose models and algorithms to address the following structures: (1) human pose estimation as structured output problem. We propose a boosted multiple tree model for modeling the spatial and occlusion constraints between human body parts; (2) temporal structure in human action recognition. We present two models based on the ``bag-of-words'' representation to capture the temporal structures of video sequences; (3) human action recognition as classification with hidden structures. We develop a model based on the hidden conditional random field to recognize human actions. We also propose a max-margin learning method for training the model. The learning method is general enough to be applied in many other applications in com puter vision, even other areas in computer science.

Keywords

Copyright statement

Copyright is held by the author.

Scholarly level

Graduate student (PhD)

Language

English

Member of collection

Computing Science Theses

Download file	Size
ETD4742.pdf	9.54 MB

Learning structured models for human actions and poses

Keywords

Views & downloads - as of June 2023