Neural state machine for 2D and 3D visual question answering

Author: 
Date created: 
2021-08-03
Identifier: 
etd21675
Keywords: 
Visual question answering
VQA
The neural state machine
NSM
3D VQA
3DVQA
3D visual question answering
Abstract: 

This thesis focuses on the Visual Question Answering (VQA) task in 2D images and 3D environments using the Neural State Machine (NSM). The NSM is a state-of-the-art approach for VQA that simulates reasoning over scene-graphs. We re-implement and extend the NSM by adding a narrowing mechanism for localised attention, and by applying bilinear attention on scene graph representations of the input scene. We show that these extensions lead to improved performance on the VQA task in both the 2D and 3D domains. Prior work on VQA has focused on reasoning in the 2D image domain, and has not addressed how the VQA task can be formulated with 3D data. To address the latter domain, we create a 3D VQA dataset based on 3D reconstructions of real environments. Then, we compare the performance of the NSM with common approaches for 3D VQA in on a range of question types. We show that the NSM is competitive with other VQA methods in the 3D domain and our extensions also lead to improved VQA accuracy in the 3D domain.

Document type: 
Thesis
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Supervisor(s): 
Angel Chang
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.
Statistics: