Chen, Lei

Thesis type

(Thesis) Ph.D.

Date created

2022-01-26

Authors/Contributors

Author: Chen, Lei

Abstract

The success of machine learning relies heavily on the data, thus is also limited by the data when no sufficient annotation can be provided for a standard supervised training pipeline. Weakly-supervised learning aims to tackle the absence of training data by relaxing the requirement of annotation to a weaker level than the desired output. We study the problem of weakly-supervised localization and grounding of actions and objects to enable the training of corresponding machine learning models without groundtruth location annotations. We propose to exploit the structure information in the weakly-supervised data to facilitate the learning of corresponding weakly-supervised models and propose three novel approaches to the above tasks. In the first work we explore the temporal structures in videos and design an attention-based loss function to help the learning of action localization focus on distinctive moments for better robustness and performance under the weakly-supervised setting. In the second work we utilize the contextual structures between visual and textual data and propose an iterative context-aware refinement for the textual and visual representations in the weakly-supervised visual grounding task, allowing flexibility of the semantic embeddings to resolve the ambiguity and adapt to different grounding scenarios. In the third work we take advantage of higher level relational structure across data to extend a previous interpretability method to embedding networks for localization which at the same time serves as visual explanation to interpret this particular type of neural network.

Keywords

Identifier

etd21802

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Mori, Greg

Language

English

Member of collection

Computing Science Theses

Download file	Size
input_data\22509\etd21802.pdf	18.32 MB

Deep networks for weakly-supervised localization and visual grounding

Keywords

Views & downloads - as of June 2023