Image layer separation is an important step for image understanding and facilitates many image processing applications. It aims to separate a single image into multiple image layers, decomposing different components of the image. Image layers are either physics-based layers such as the reflectance layer in intrinsic image decomposition, or semantic layers such as the occlusion layer in image de-hazing, raindrop removal problems. Since the number of unknowns is at least twice that of the inputs, image layer separation problems are ill-posed and challenging. In order to solve such ill-posed problems, traditional methods acquire additional constraints based on prior knowledge, and recent deep learning methods rely on training data. In this thesis, we propose an optimization-based method based on handcrafted priors for video de-fencing (separating fence-like occlusion layers from dynamic videos), and an unsupervised deep learning training scheme for utilizing unlabeled real images from the Internet, which is applied on highlight separation and intrinsic image decomposition. Traditional methods make assumptions based on observations and priors to acquire additional constraints and solve it as an optimization problem. In this thesis, we solve video de-fencing by a novel bottom-up pipeline based on such traditional optimization-based method. We present a fully automatic approach to detect and segment fence-like occluders from a video clip. Unlike previous approaches that usually assume either static scenes or cameras, our method is capable of handling both dynamic scenes and moving cameras. After that, we introduce the main challenges of recent deep learning methods for image layer separation, which is the lack of real-world training data with ground truth. Thus, we propose an unsupervised training scheme for training the network on unlabeled real images. This unsupervised training scheme is then applied to two image layer separation problems, which are highlight separation for facial images trained from celebrity photos, and non-Lambertian intrinsic image decomposition trained from customer product photos. Finally, we demonstrate one application from separated image layers, where we use faces as light probes to estimate the environment illumination. It is important for mixed reality applications, such as inserting virtual objects into real photos. Our technique estimates illumination at high precision in the form of a non-parametric environment map, and it works well for both indoor and outdoor scenes.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Tan, Ping
Member of collection