Skip to main content

Generalized dynamic human digitalization with sparse-view cameras

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2024-06-24
Authors/Contributors
Author: Tang, Sicong
Abstract
In recent years, 3D human digitization has gained significant attention across various domains, including virtual reality, augmented reality, gaming, and healthcare. Sparse-view cameras offer a promising avenue for capturing human motion and appearance, because their setup makes systems easier to deploy, thus enhancing practicality. However, limited observation under sparse view setting poses challenges for accurate 3D modeling. This thesis focuses on generalized learning-based solutions for digitizing human bodies using sparse-view camera observations, aiming to develop practical solutions for diverse applications. Firstly, we developed a neural network to estimate human body depth from single RGB images, enabling real-time depth estimation and detailed geometric features for applications that require high-fidelity representation. Secondly, we proposed a generalized neural network architecture for reconstructing textured 3D human mesh models from sparse multi-view cameras. By leveraging spatial context information through 3D CNNs, this method efficiently produces detailed models capable of handling occlusion, complex clothing, and multi-person scenarios robustly. Building upon the second solution, our research extended into the 4D Gaussian representation of dynamic human sequences. By leveraging initial mesh models, our proposed generalized network computed more efficient representations, enhancing computational efficiency while preserving temporal dynamics. These three works above are all in feed-forward manner and can generate the human representation efficiently. They can be applied to various demanding applications, The first one is suitable for single view and real-time demanding scenarios. And the second solution aims to achieve high-fidelity textured shapes with sparse camera input. The last one uses a high compact representation to solve the transmission problem, which shows its potential on 4D VR video applications.
Document
Extent
106 pages.
Identifier
etd23127
Copyright statement
Copyright is held by the author(s).
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Tan, Ping
Thesis advisor: Furukawa, Yasutaka
Language
English
Member of collection
Download file Size
etd23127.pdf 56.74 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 0