Articulated object reconstruction from interaction videos

Thesis type
(Thesis) M.Sc.
Date created
Author: Xu, Xiang
This thesis studies the problem of articulated object reconstruction from an input video. Our focus is on estimating the shape, pose, and part motion of an articulated object during human-object manipulation. The task is challenging as the object is dynamically changing and 3D reconstruction from 2D is inherently ambiguous. To enable research in this direction, we first create D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object shape, pose and part motion from human-object interaction videos. Our dataset consists of several common categories of articulated objects in diverse real-world scenes, observed from a variety of fixed camera view points. Each manipulated object (e.g., microwave) is represented using a 3D parametric model that best fits the captured data. We then annotate the size, pose, and part articulation values at every frame. A novel optimization-based method is proposed based on differentiable renderer and human-object interaction terms, which leverage the human pose for better inferring of the object spatial layout and dynamics. We evaluate this new approach on our dataset, demonstrating that human-object relations can significantly reduce the pose and motion errors on real-world articulated objects. Code and dataset are available at the following link (
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Furukawa, Yasutaka
Member of collection
Attachment Size
input_data\22179\etd21501.pdf 16.53 MB