Nawhal, Megha

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2023-04-18

Authors/Contributors

Author: Nawhal, Megha

Abstract

Compositionality serves as a key design principle in artificial intelligence algorithms. In this thesis, we focus on developing compositional models for activity understanding. The core idea of this thesis is to design compositional representations for human activity videos that are specific to the downstream task and are learned using different types of compositional information available at various granularities of the videos. We applied this idea to a diverse set of video tasks aimed at understanding realistic activities. First, we introduce the task of generating human-object interactions in a zero-shot compositional setting and propose a generative model that uses an object-centric spatio-temporal scene graph for generating videos. Second, we work on the problem of temporal action localization and develop an end-to-end learnable transformer model that represents the input video as graphs over video segments and output space of actions as graphs of abstract learnable entities. Third, we focus on the task of long term action anticipation and design a transformer based model trained using two-stage learning approach to employ segment-level and video-level representations for action anticipation. Overall, we demonstrate the benefits of designing compositional representations for human activity videos.

Extent

132 pages.

Keywords

Identifier

etd22405

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Mori, Greg

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd22405.pdf	8.95 MB

Learning compositional models for activity understanding

Keywords

Views & downloads - as of June 2023