Xu, Xiang

Thesis type

(Thesis) M.Sc.

Date created

2021-08-10

Authors/Contributors

Author: Xu, Xiang

Abstract

This thesis studies the problem of articulated object reconstruction from an input video. Our focus is on estimating the shape, pose, and part motion of an articulated object during human-object manipulation. The task is challenging as the object is dynamically changing and 3D reconstruction from 2D is inherently ambiguous. To enable research in this direction, we first create D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object shape, pose and part motion from human-object interaction videos. Our dataset consists of several common categories of articulated objects in diverse real-world scenes, observed from a variety of fixed camera view points. Each manipulated object (e.g., microwave) is represented using a 3D parametric model that best fits the captured data. We then annotate the size, pose, and part articulation values at every frame. A novel optimization-based method is proposed based on differentiable renderer and human-object interaction terms, which leverage the human pose for better inferring of the object spatial layout and dynamics. We evaluate this new approach on our dataset, demonstrating that human-object relations can significantly reduce the pose and motion errors on real-world articulated objects. Code and dataset are available at the following link (https://github.com/facebookresearch/d3d-hoi).

Keywords

Identifier

etd21501

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Furukawa, Yasutaka

Language

English

Member of collection

Computing Science Theses

Download file	Size
input_data\22179\etd21501.pdf	16.53 MB

Articulated object reconstruction from interaction videos

Keywords

Views & downloads - as of June 2023