Coreference resolution is a challenging problem that requires clustering relevant mentions based on referent objects in a text document. Most work on it has relied extensively on text-only datasets, which fail to provide visual cues about the entities represented by the phrases. On this basis, we introduce DenseRefer3D, a language \& 3D dataset to create alignment between rich referring expressions and real-world objects and an annotation tool, DenseRefer3D-Annotator, that facilitates the rendering of natural language sentences and 3D scenes. The tool provides functionalities to manage data collection workflow on the MTurk crowdsourcing platform efficiently and enables effective visualization of coreference links and phrases-to-object mappings. We outline several coreference experiments using an end-to-end deep learning approach, analyze the quality of detected mentions and clustering, propose a new task that directly aligns textual phrases with 3D objects, and explore ways to further research in the combined domain of language and vision.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Chang, Angel
Member of collection