Skip to main content

Advanced subspace methods for low/mid-level vision

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2020-02-19
Authors/Contributors
Abstract
Low- and mid-level vision tasks are fundamental to computer vision. They are important not only in themselves but also for higher-level tasks as cornerstones. Low-level tasks are basically about extracting primitive information, such as edges, textures, and correspondences from images. And mid-level tasks, from the Gestalt psychologists' perspective, are grouping mechanisms on low-level visual information. In particular, inferring the geometric information from images and segmenting an image into object-level regions are two major aspects of mid-level tasks. In this thesis, we make advances in solving real-world low- and mid-level problems using subspace based representations. For monocular visual SLAM, we solve the visual odometry in a rank-1 factorization and solve the pose-graph optimization by linear programming in multi-stage, which are more robust to initialization errors in the local 3D maps and the global pose-graph respectively. For dense 3D reconstruction, which is also a mid-level task, we represent a depth map as a linear combination of several basis depths from an underlying subspace, and learn a convolutional neural network to generate such a basis. To estimate the depth maps as well as the camera poses, we propose a differentiable bundle adjustment layer that optimizes for the depth map and camera poses by minimizing a feature-metric error. The feature-metric error is defined over a feature pyramid, which is learned jointly with the basis generator end-to-end. For broader low-level vision tasks, we also adopt a basis representation, but for a different purpose. Conventionally, a low-level task is formulated as a continuous energy minimization problem, where the objective function contains a data fidelity term and a smoothness regularization term. We replace the regularization term with a learnable subspace constraint and define the objective function only with the data term. This methodology unifies the network structures and the parameters for many low-level vision tasks and even generalizes to unseen tasks, as long as the corresponding data terms can be formulated. In summary, we explore the subspace based methods from manually derived low-rank formulation to learning based subspace minimization, which are conceptually novel compared to the existing methods. To demonstrate the effectiveness of the proposed methods, we conduct extensive experiments for all the involved tasks on public benchmarks as well as our own data. The results show that our methods have achieved comparable or better performance than state-of-the-art methods with better computational efficiency.
Document
Identifier
etd20758
Copyright statement
Copyright is held by the author.
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Tan, Ping
Member of collection
Download file Size
etd20758.pdf 53.64 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 2