Resource type
Thesis type
(Thesis) Ph.D.
Date created
2018-11-20
Authors/Contributors
Author: Ibrahim, Moustafa
Abstract
Multi-person activity recognition is an important and challenging problem for the computer vision community with several applications such as visual surveillance and video summarization. For a long time, shallow architectures (e.g., SVM) were used with manually extracted features to answer the intended queries, but with unsatisfactory performance due to limitedness of feature engineering which may drop significant explanatory factors of data. An alternative is to automatically learn features at multiple levels of abstraction from raw visual data through Deep Convolutional Neural Networks (DCNN). In this thesis we make three contributions toward human activity understanding based on DCNN. 1) We propose hierarchical deep temporal models that automatically learn feature representation for individual person actions as well as the whole group activity while capturing temporal dynamics that exist at both levels. 2) We investigate approaches for action localization, a critical sub-problem in the multi-person activity recognition problem. 3) A graph-based network module for relational reasoning is introduced to capture hierarchical relationships among people in a video scene. Overall, the proposed models recognize the collective activity of individuals and their complex interactions by modeling different types of cues in a deep hierarchical temporal manner.
Document
Identifier
etd19937
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Mori, Greg
Member of collection
Download file | Size |
---|---|
etd19937.pdf | 34.4 MB |