Bai, Ziqian

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2024-05-10

Authors/Contributors

Author: Bai, Ziqian

Abstract

3D digital human has become the fundamental technique of numerous downstream applications, including movie and game production, augmented and virtual reality communication, as well as virtual try-on and tourism. However, properly modeling 3D human is not trivial due to its complex geometries, appearances, and motions, which are caused by large variations across identities and sophisticated biological structures of human. Therefore, it is crucial to divide 3D digital human into sub-problems and address their unique challenges. In this thesis, we propose solutions for estimating reconstructions and building avatars of faces, heads, and bodies to advance the field of 3D digital human modeling. We start with monocular 3D face reconstructions and propose a mesh-based approach "Deep Facial Non-Rigid Multi-View Stereo (DFNRMVS)". We design an end-to-end trainable neural network embedded with a differentiable in-network optimization that enforces constraints governed by the first-principles (e.g., multi-view consistency and landmark alignment). DFNRMVS leverages advantages from both traditional optimization and deep learning, leading to more accurate 3D reconstructions with good generalization. Next, we extend the static 3D reconstructions of DFNRMVS to animatable 3D face avatars by proposing INORig. We re-design the linear face model in DFNRMVS into a non-linear neural avatar with separate identity and expression latent codes to support animation. INORig achieves state-of-the-art reconstruction accuracy, reasonable robustness and generalization, and can be used in standard avatar applications. Then, we propose MonoAvatar to learn high-quality implicit 3D head avatars from monocular RGB videos captured in the wild. To achieve user-controlled facial expressions and head poses as well as photorealistic renderings, we propose to predict a Neural Radiance Field (NeRF) from expression-dependent local features attached on the mesh vertices of a 3D Morphable Model (3DMM), leading to a 3DMM-anchored NeRF framework. Extensive experiments show that we are able to reconstruct high-quality avatars, with more accurate expression-dependent details, good generalization to out-of-training expressions, and quantitatively superior renderings compared to other state-of-the-art approaches. Finally, we propose AutoAvatar, an implicit body avatar capturing history-dependent dynamic effects caused by physics, such as inertia and elastic deformations of soft-tissues. For the first time, our approach enables autoregressive modeling of neural implicit surfaces, and achieves plausible dynamic deformations even for unseen motions.

Extent

101 pages.

Keywords

Identifier

etd23113

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Tan, Ping

Thesis advisor: Furukawa, Yasutaka

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd23113.pdf	43.06 MB

3D digital human reconstructions and avatars

Keywords

Views & downloads - as of June 2023