Human-3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

Authors: Yuxuan Xue, Xianghui Xie, Riccardo Marin, Gerard Pons-Moll

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that our proposed framework outperforms state-of-the-art methods and enables the creation of realistic avatars from a single RGB image, achieving high-fidelity in both geometry and appearance. Extensive ablations also validate the efficacy of our design, (1) multi-view 2D priors conditioning in generative 3D reconstruction and (2) consistency refinement
Researcher Affiliation Academia Yuxuan Xue1,2 Xianghui Xie1,2,3 Riccardo Marin1,2 Gerard Pons-Moll1,2,3 1University of Tübingen 2 Tübingen AI Center 3Max Planck Institute for Informatics, Saarland Informatics Campus
Pseudocode Yes Algorithm 1 Training Algorithm 2 3D Consistent Sampling
Open Source Code No Our code and models will be released here. Our code and pretrained models will be publicly released on our project page.
Open Datasets Yes Datasets. We train our model on a combined 3D human dataset [1, 3, 4, 2, 21, 27, 65, 98] compromising 6000 high quality scans.
Dataset Splits No The paper states it evaluates on CAPE, Sizer, and IIIT datasets, and randomly samples 150 scans from Sizer and IIIT for evaluation. This describes the test sets, but not a distinct validation split used during training of the model.
Hardware Specification Yes We trained our model on 8 NVIDIA A100 GPUs over approximately 5 days. Each GPU was configured with a batch size 2 and gradient accumulations of 16 steps to achieve an effective batch size of 256.
Software Dependencies No The paper mentions software like DDPM scheduler, DDIM scheduler, Image Dream, VAE, Gaussian Opacity field, TSDF, PIFU-HD, and Bilateral Normal Integration, but it does not specify exact version numbers for these software components.
Experiment Setup Yes The hyperparameters for training Eq. (8) were set as follows: λ1 = 1.0, λ2 = 1.0, and λ3 = 100.0. During training, we employed the standard DDPM scheduler [25] to construct noisy target images xtgt t . The maximum diffusion step T is set to 1000. At inference time, we use DDIM scheduler [63] to perform faster reverse sampling. The reverse steps is set to 50 in following experiments. Each GPU was configured with a batch size 2 and gradient accumulations of 16 steps to achieve an effective batch size of 256.