AvatarVerse: High-Quality & Stable 3D Avatar Creation from Text and Pose
Authors: Huichao Zhang, Bowen Chen, Hao Yang, Liao Qu, Xu Wang, Li Chen, Chao Long, Feida Zhu, Daniel Du, Min Zheng
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Rigorous qualitative evaluations and user studies showcase Avatar Verse s superiority in synthesizing high-fidelity 3D avatars, leading to a new standard in high-quality and stable 3D avatar creation. Our project page is: https://avatarverse3d.github.io/ . 4 Experiments In this section, we illustrate the effectiveness of our proposed method. We demonstrate the efficacy of each proposed strategy and provide a detailed comparison against recent state-of-the-art methods. 4.3 User Study To further assess the quality of our generated 3D avatars, we conduct user studies comparing the performance of our results with four SOTA methods under the same text prompts. 4.4 Ablation Study To evaluate the design choices of Avatar Verse, we conduct an ablation study on the effectiveness of b) the progressive grid, c) the progressive radius, d) the focus mode, and e) the mesh refinement. |
| Researcher Affiliation | Collaboration | Huichao Zhang1*, Bowen Chen1*, Hao Yang1, Liao Qu1, 2, Xu Wang1 Li Chen1, Chao Long1, Feida Zhu1, Daniel Du1, Min Zheng1 1Byte Dance, Beijing, China. 2Carnegie Mellon University, PA, USA |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our project page is: https://avatarverse3d.github.io/ . This is a project page, not an explicit statement of code release or a direct link to a code repository. |
| Open Datasets | Yes | We annotate the Deep Fashion dataset (Liu et al. 2016) using a pretrained Dense Pose (G uler, Neverova, and Kokkinos 2018) model, resulting in over 800K image pairs. |
| Dataset Splits | No | The paper mentions training iterations and stages but does not specify training, validation, and test dataset splits (e.g., percentages or counts). |
| Hardware Specification | Yes | The whole generation process takes around 2 hours on one single NVIDIA A100 GPU. |
| Software Dependencies | Yes | The diffusion model employed is SD1.5. |
| Experiment Setup | Yes | For each text prompt, we train Avatar Verse for 5000 and 4000 iterations in the coarse stage and mesh refinement stage, respectively. For progressive grid, we double the number of voxels at 500, 1500, and 2000 iterations at the coarse stage. Our progressive radius consists of three stages, where the camera radius ranges from 1.4 to 2.1, 1 to 1.5, and 0.8 to 1.2 respectively. We reduce the radius at 1000 and 2000 iterations across both stages. Our focus mode starts from the 1000-th step in the coarse stage and is consistently employed throughout the mesh refinement phase. |