Tang, Sicong

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2024-06-24

Authors/Contributors

Author: Tang, Sicong

Abstract

In recent years, 3D human digitization has gained significant attention across various domains, including virtual reality, augmented reality, gaming, and healthcare. Sparse-view cameras offer a promising avenue for capturing human motion and appearance, because their setup makes systems easier to deploy, thus enhancing practicality. However, limited observation under sparse view setting poses challenges for accurate 3D modeling. This thesis focuses on generalized learning-based solutions for digitizing human bodies using sparse-view camera observations, aiming to develop practical solutions for diverse applications. Firstly, we developed a neural network to estimate human body depth from single RGB images, enabling real-time depth estimation and detailed geometric features for applications that require high-fidelity representation. Secondly, we proposed a generalized neural network architecture for reconstructing textured 3D human mesh models from sparse multi-view cameras. By leveraging spatial context information through 3D CNNs, this method efficiently produces detailed models capable of handling occlusion, complex clothing, and multi-person scenarios robustly. Building upon the second solution, our research extended into the 4D Gaussian representation of dynamic human sequences. By leveraging initial mesh models, our proposed generalized network computed more efficient representations, enhancing computational efficiency while preserving temporal dynamics. These three works above are all in feed-forward manner and can generate the human representation efficiently. They can be applied to various demanding applications, The first one is suitable for single view and real-time demanding scenarios. And the second solution aims to achieve high-fidelity textured shapes with sparse camera input. The last one uses a high compact representation to solve the transmission problem, which shows its potential on 4D VR video applications.

Extent

106 pages.

Keywords

Identifier

etd23127

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Tan, Ping

Thesis advisor: Furukawa, Yasutaka

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd23127.pdf	56.74 MB

Generalized dynamic human digitalization with sparse-view cameras

Keywords

Views & downloads - as of June 2023