Tang, Chengzhou

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2020-02-19

Authors/Contributors

Author: Tang, Chengzhou

Abstract

Low- and mid-level vision tasks are fundamental to computer vision. They are important not only in themselves but also for higher-level tasks as cornerstones. Low-level tasks are basically about extracting primitive information, such as edges, textures, and correspondences from images. And mid-level tasks, from the Gestalt psychologists' perspective, are grouping mechanisms on low-level visual information. In particular, inferring the geometric information from images and segmenting an image into object-level regions are two major aspects of mid-level tasks. In this thesis, we make advances in solving real-world low- and mid-level problems using subspace based representations. For monocular visual SLAM, we solve the visual odometry in a rank-1 factorization and solve the pose-graph optimization by linear programming in multi-stage, which are more robust to initialization errors in the local 3D maps and the global pose-graph respectively. For dense 3D reconstruction, which is also a mid-level task, we represent a depth map as a linear combination of several basis depths from an underlying subspace, and learn a convolutional neural network to generate such a basis. To estimate the depth maps as well as the camera poses, we propose a differentiable bundle adjustment layer that optimizes for the depth map and camera poses by minimizing a feature-metric error. The feature-metric error is defined over a feature pyramid, which is learned jointly with the basis generator end-to-end. For broader low-level vision tasks, we also adopt a basis representation, but for a different purpose. Conventionally, a low-level task is formulated as a continuous energy minimization problem, where the objective function contains a data fidelity term and a smoothness regularization term. We replace the regularization term with a learnable subspace constraint and define the objective function only with the data term. This methodology unifies the network structures and the parameters for many low-level vision tasks and even generalizes to unseen tasks, as long as the corresponding data terms can be formulated. In summary, we explore the subspace based methods from manually derived low-rank formulation to learning based subspace minimization, which are conceptually novel compared to the existing methods. To demonstrate the effectiveness of the proposed methods, we conduct extensive experiments for all the involved tasks on public benchmarks as well as our own data. The results show that our methods have achieved comparable or better performance than state-of-the-art methods with better computational efficiency.

Keywords

Identifier

etd20758

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (PhD)

Supervisor or Senior Supervisor

Thesis advisor: Tan, Ping

Member of collection

Computing Science Theses

Download file	Size
etd20758.pdf	53.64 MB

Advanced subspace methods for low/mid-level vision

Keywords

Views & downloads - as of June 2023