Resource type
Thesis type
(Thesis) M.Sc.
Date created
2022-10-20
Authors/Contributors
Author: Sharma, Akshit
Abstract
Coreference resolution is a challenging problem that requires clustering relevant mentions based on referent objects in a text document. Most work on it has relied extensively on text-only datasets, which fail to provide visual cues about the entities represented by the phrases. On this basis, we introduce DenseRefer3D, a language \& 3D dataset to create alignment between rich referring expressions and real-world objects and an annotation tool, DenseRefer3D-Annotator, that facilitates the rendering of natural language sentences and 3D scenes. The tool provides functionalities to manage data collection workflow on the MTurk crowdsourcing platform efficiently and enables effective visualization of coreference links and phrases-to-object mappings. We outline several coreference experiments using an end-to-end deep learning approach, analyze the quality of detected mentions and clustering, propose a new task that directly aligns textual phrases with 3D objects, and explore ways to further research in the combined domain of language and vision.
Document
Extent
86 pages.
Identifier
etd22202
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Chang, Angel
Language
English
Member of collection
Download file | Size |
---|---|
etd22202.pdf | 7.56 MB |