Resource type
Thesis type
(Thesis) M.Sc.
Date created
2018-12-19
Authors/Contributors
Author: Fu, Yifang
Abstract
We propose a deep learning approach to the video visual relation detection problem which aims to spatiotemporally localize objects in videos and then predicts the interaction relationship between objects. A video visual relation instance is represented by a relation triplet with the trajectories of the object1 and object2. Our framework is composed of three stages. In stage one, an object tubelet detection model is employed on video RGB frames, which takes as input a sequence of frames and output object tubelets. In stage two, pairs of object tubelets are passed to a temporal relation detection model, which outputs a relation predicate between objects as relation tubelet. In stage three, detected short-term relation tubelets which have same relation triplet and efficient high volume overlap are associated into relation tube. We validate our method on VidVRD dataset and demonstrate that the performance of our method outperforms the state-of-the-art baselines.
Document
Identifier
etd20048
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Mori, Greg
Member of collection
Download file | Size |
---|---|
etd20048.pdf | 40.26 MB |