SFU Search
In this thesis, we present new methods to address multi-person scene understanding. Specifically, we focus on a multi-person task known as group activity recognition. We analyze multi-person scene understanding from the perspective of group activity recognition. We identify key challenges in group activity recognition, and present deep neural networks based approaches to handle these challenges. We show that our proposed approaches achieve competitive performance for group activity recognition. We also study one of the key components of group activity recognition in more detail, that is the problem of sequence modeling, where we apply new sequence modeling methods to the task of dense video captioning. In the end, we also investigate how to compress these large deep neural networks for efficient recognition on specialized domain tasks.