Neural state machine for 2D and 3D visual question answering

Thesis type
(Thesis) M.Sc.
Date created
This thesis focuses on the Visual Question Answering (VQA) task in 2D images and 3D environments using the Neural State Machine (NSM). The NSM is a state-of-the-art approach for VQA that simulates reasoning over scene-graphs. We re-implement and extend the NSM by adding a narrowing mechanism for localised attention, and by applying bilinear attention on scene graph representations of the input scene. We show that these extensions lead to improved performance on the VQA task in both the 2D and 3D domains. Prior work on VQA has focused on reasoning in the 2D image domain, and has not addressed how the VQA task can be formulated with 3D data. To address the latter domain, we create a 3D VQA dataset based on 3D reconstructions of real environments. Then, we compare the performance of the NSM with common approaches for 3D VQA in on a range of question types. We show that the NSM is competitive with other VQA methods in the 3D domain and our extensions also lead to improved VQA accuracy in the 3D domain.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Chang, Angel
Member of collection
Attachment Size
input_data\22311\etd21675.pdf 2.15 MB