Ibrahim, Moustafa

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2018-11-20

Authors/Contributors

Author: Ibrahim, Moustafa

Abstract

Multi-person activity recognition is an important and challenging problem for the computer vision community with several applications such as visual surveillance and video summarization. For a long time, shallow architectures (e.g., SVM) were used with manually extracted features to answer the intended queries, but with unsatisfactory performance due to limitedness of feature engineering which may drop significant explanatory factors of data. An alternative is to automatically learn features at multiple levels of abstraction from raw visual data through Deep Convolutional Neural Networks (DCNN). In this thesis we make three contributions toward human activity understanding based on DCNN. 1) We propose hierarchical deep temporal models that automatically learn feature representation for individual person actions as well as the whole group activity while capturing temporal dynamics that exist at both levels. 2) We investigate approaches for action localization, a critical sub-problem in the multi-person activity recognition problem. 3) A graph-based network module for relational reasoning is introduced to capture hierarchical relationships among people in a video scene. Overall, the proposed models recognize the collective activity of individuals and their complex interactions by modeling different types of cues in a deep hierarchical temporal manner.

Keywords

Identifier

etd19937

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (PhD)

Supervisor or Senior Supervisor

Thesis advisor: Mori, Greg

Member of collection

Computing Science Theses

Download file	Size
etd19937.pdf	34.4 MB

Deep models for multi-person activity understanding

Keywords

Views & downloads - as of June 2023