Author: Tang, Shitao
This thesis presents a method for camera localization. Given a set of reference images with known camera poses, camera localization aims to estimate the 6 DoF camera pose for an arbitrary query image captured in the same environment. It might also be generalized to recover the 6 DoF pose of each video frame of an input query video. Traditional methods detect and match interest points between the query image and a pre-built 3D model, and then solve camera poses accordingly by the PnP algorithm combined with RANSAC. The recent development of deep learning has motivated end-to-end approaches for camera localization. Those methods encode scene structures into the parameters of a specific convolutional neural network (CNN) and thus are able to predict a dense coordinate map for a query image whose pixels record 3D scene coordinates. This dense coordinate map can be used to estimate camera poses in the same way as traditional methods. However, most of these learning-based methods require re-training or re-adaption for a new scene and have difficulties in handling large-scale scenes due to limited network capacity. In this thesis, We present a new method for scene agnostic camera localization which can be applied to a novel scene without retraining. This scene agnostic localization is achieved with our dense scene matching (DSM) technique, where a cost volume is constructed between a query image and a scene. The cost volume is fed to a CNN to predict the dense coordinate map to compute the 6 DoF camera pose. In addition, our method can be directly applied to deal with query videoclips, which leads to extra performance boost during testing time by exploring temporal constraint between neighboring frames. Our method achieves state-of-the-art performance over several benchmarks.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Tan, Ping
Member of collection