Indoor environments are mainly composed of man-made, functional 3D objects plausibly arranged in a region-bounded space. Arguably, the most important goal when designing an indoor scene is to make it functional, meaning that the resulting scene should serve its intended usage. For example, a living room is typically comprised of a sofa set and a TV, serving the usage of "watching TV", and their arrangement in the room should not constrain movement and space usage. This is determined, in parts, by object relations and their cooccurrences, as well as the different functionalities and interactions offered by the individual objects themselves. Object functionality involves part-level reasoning, where object parts can undergo motions/articulations. Furniture models typically comprising indoor scenes inevitably reveal their interiors when undergoing part articulations. For example, one can observe the interiors when a cabinet door is opened. In other words, the overall scene layout, in terms of object arrangements, as well as the objects themselves, in terms of their part motions and geometry, must serve the intended purposes. Developing computational tools for designing such indoor scenes is challenging because the desired solution(s) to the end goal can not always be simplified, if at all, to a set of simple rules. With the availability of large datasets and appropriate computational resources, it is natural to seek data-driven learning-based algorithms to model these entities. This dissertation explores the design of 3D indoor scenes, with advanced algorithmic development and evaluation in mind, going from the scene layout level to the object level, where functionality plays a key role in both cases. To this end, the dissertation is made up of three works, the first of which, GRAINS, presents the first end-to-end deep generative hierarchical neural network that learns object relations and co-occurrences following indoor object arrangement rules, and synthesizes novel 3D scenes. There is however a lack of principled means to evaluate and compare the generated scenes, which leads to the next work, LayoutGMN. As a first step, it presents a neural graph matching network that compares abstractions of 3D indoor scenes in a structural manner. Finally, for understanding and modeling functionality at the object level, a neural framework, called RoSI, for recovering 3D shape interiors and realizing 3D part motions from sparse multi-articulation images of 3D shapes is presented.
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Zhang, Hao
Member of collection