Resource type
Thesis type
(Thesis) M.Sc.
Date created
2014-03-11
Authors/Contributors
Author: Sefidgar, Yasaman Sadat
Abstract
Automatic activity detection in videos has several applications in visual surveillance, video retrieval, and human-computer interaction. The task, at its core, requires expressive models of activities. The models that represent activities as arrangements of key components are generally more descriptive and robust to challenges such as occlusion, clutter, and high intra-class variability. They can thus lead to improved classification performance. Following this idea, we model human-object interactions as sequences of locally discriminative temporal segments capturing objects appearance and their interrelations. In a two-stage pipeline, we coarsely localize humans and objects in long videos. We then more closely examine their content using our key-segment model trained in the latent SVM framework. We evaluate our approach on VIRAT Ground Dataset Release 2.0 for detecting instances of human-vehicle interactions. Results show that our key-segment model significantly outperforms the common global Bag of Words approach.
Document
Identifier
etd8290
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Mori, Greg
Member of collection
Download file | Size |
---|---|
etd8290_YSefidgar.pdf | 23.6 MB |