Resource type
Thesis type
(Thesis) Ph.D.
Date created
2024-05-10
Authors/Contributors
Author: Bai, Ziqian
Abstract
3D digital human has become the fundamental technique of numerous downstream applications, including movie and game production, augmented and virtual reality communication, as well as virtual try-on and tourism. However, properly modeling 3D human is not trivial due to its complex geometries, appearances, and motions, which are caused by large variations across identities and sophisticated biological structures of human. Therefore, it is crucial to divide 3D digital human into sub-problems and address their unique challenges. In this thesis, we propose solutions for estimating reconstructions and building avatars of faces, heads, and bodies to advance the field of 3D digital human modeling. We start with monocular 3D face reconstructions and propose a mesh-based approach "Deep Facial Non-Rigid Multi-View Stereo (DFNRMVS)". We design an end-to-end trainable neural network embedded with a differentiable in-network optimization that enforces constraints governed by the first-principles (e.g., multi-view consistency and landmark alignment). DFNRMVS leverages advantages from both traditional optimization and deep learning, leading to more accurate 3D reconstructions with good generalization. Next, we extend the static 3D reconstructions of DFNRMVS to animatable 3D face avatars by proposing INORig. We re-design the linear face model in DFNRMVS into a non-linear neural avatar with separate identity and expression latent codes to support animation. INORig achieves state-of-the-art reconstruction accuracy, reasonable robustness and generalization, and can be used in standard avatar applications. Then, we propose MonoAvatar to learn high-quality implicit 3D head avatars from monocular RGB videos captured in the wild. To achieve user-controlled facial expressions and head poses as well as photorealistic renderings, we propose to predict a Neural Radiance Field (NeRF) from expression-dependent local features attached on the mesh vertices of a 3D Morphable Model (3DMM), leading to a 3DMM-anchored NeRF framework. Extensive experiments show that we are able to reconstruct high-quality avatars, with more accurate expression-dependent details, good generalization to out-of-training expressions, and quantitatively superior renderings compared to other state-of-the-art approaches. Finally, we propose AutoAvatar, an implicit body avatar capturing history-dependent dynamic effects caused by physics, such as inertia and elastic deformations of soft-tissues. For the first time, our approach enables autoregressive modeling of neural implicit surfaces, and achieves plausible dynamic deformations even for unseen motions.
Document
Extent
101 pages.
Identifier
etd23113
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Tan, Ping
Thesis advisor: Furukawa, Yasutaka
Language
English
Member of collection
Download file | Size |
---|---|
etd23113.pdf | 43.06 MB |