Fu, Yifang

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2018-12-19

Authors/Contributors

Author: Fu, Yifang

Abstract

We propose a deep learning approach to the video visual relation detection problem which aims to spatiotemporally localize objects in videos and then predicts the interaction relationship between objects. A video visual relation instance is represented by a relation triplet with the trajectories of the object1 and object2. Our framework is composed of three stages. In stage one, an object tubelet detection model is employed on video RGB frames, which takes as input a sequence of frames and output object tubelets. In stage two, pairs of object tubelets are passed to a temporal relation detection model, which outputs a relation predicate between objects as relation tubelet. In stage three, detected short-term relation tubelets which have same relation triplet and efficient high volume overlap are associated into relation tube. We validate our method on VidVRD dataset and demonstrate that the performance of our method outperforms the state-of-the-art baselines.

Keywords

Identifier

etd20048

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (Masters)

Supervisor or Senior Supervisor

Thesis advisor: Mori, Greg

Member of collection

Computing Science Theses

Download file	Size
etd20048.pdf	40.26 MB

Deep video visual relation detection

Keywords

Views & downloads - as of June 2023