Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models
Authors: Yuxuan Xue, Xianghui Xie, Riccardo Marin, Gerard Pons-Moll
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our proposed framework outperforms state-of-the-art methods and enables the creation of realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance. Extensive ablations also validate the efficacy of our design, (1) multi-view 2D priors conditioning in generative 3D reconstruction and (2) consistency refinement |
| Researcher Affiliation | Academia | Yuxuan Xue1,2 Xianghui Xie1,2,3 Riccardo Marin1,2 Gerard Pons-Moll1,2,3 1University of Tübingen 2 Tübingen AI Center 3Max Planck Institute for Informatics, Saarland Informatics Campus |
| Pseudocode | Yes | Algorithm 1 Training Algorithm 2 3D Consistent Sampling |
| Open Source Code | No | Our code and models will be released here. Our code and pretrained models will be publicly released on our project page. |
| Open Datasets | Yes | Datasets. We train our model on a combined 3D human dataset [1, 3, 4, 2, 21, 27, 65, 98] compromising 6000 high quality scans. |
| Dataset Splits | No | The paper states it evaluates on CAPE, Sizer, and IIIT datasets, and randomly samples 150 scans from Sizer and IIIT for evaluation. This describes the test sets, but not a distinct validation split used during training of the model. |
| Hardware Specification | Yes | We trained our model on 8 NVIDIA A100 GPUs over approximately 5 days. Each GPU was configured with a batch size 2 and gradient accumulations of 16 steps to achieve an effective batch size of 256. |
| Software Dependencies | No | The paper mentions software like DDPM scheduler, DDIM scheduler, Image Dream, VAE, Gaussian Opacity field, TSDF, PIFU-HD, and Bilateral Normal Integration, but it does not specify exact version numbers for these software components. |
| Experiment Setup | Yes | The hyperparameters for training Eq. (8) were set as follows: λ1 = 1.0, λ2 = 1.0, and λ3 = 100.0. During training, we employed the standard DDPM scheduler [25] to construct noisy target images xtgt t . The maximum diffusion step T is set to 1000. At inference time, we use DDIM scheduler [63] to perform faster reverse sampling. The reverse steps is set to 50 in following experiments. Each GPU was configured with a batch size 2 and gradient accumulations of 16 steps to achieve an effective batch size of 256. |