YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals

Authors: Sandeep Mishra, Oindrila Saha, Alan Bovik

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we compare YOUDREAM against various baselines and evaluate the effect of various components of our method. We show qualitative comparison with text-to-3D methods... We also compare with MVDream... We conduct a user study to quantitatively evaluate our method against these baselines. We also compute CLIP score following previous work Shi et al. (2023), shown in Sec. I. Additionally, we present ablations over the various modules that constitute YOUDREAM.
Researcher Affiliation Academia University of Texas at Austin, University of Massachusetts Amherst
Pseudocode No The paper describes its methodology in text and equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Visualizations and code are available at https://youdream3d.github.io/. ... We open-source this setup along with our project code.
Open Datasets Yes We used annotated poses from the Aw A-pose Banik et al. (2021) and Animal Kingdom Ng et al. (2022) datasets to train Control Net in a similar way as the original paper, which uses stable diffusion version 1.5.
Dataset Splits No The paper specifies training details such as 'trained over 229k iterations' and 'pre-trained for 10,000 iterations', but it does not explicitly provide information on training/test/validation dataset splits or refer to a validation set.
Hardware Specification Yes The model was trained over 229k iterations with a batch size of 12, a constant learning rate of 1e 5, on a single Nvidia RTX 6000. ... The pre-training stage, if used, takes less than 12 minutes to complete, while the fine-tuning stage takes less than 40 minutes to complete on a single A100 40GB GPU. ... All the experiments pertaining to YOUDREAM and 3DFuse were run on Nvidia A100 40GB GPU. Few experiments for MVDream and all experiments of Hi FA required running on A100 80GB GPU, while all experiments for Fantasia3D were run on 3x A100 40GB GPUs.
Software Dependencies Yes We used annotated poses from the Aw A-pose Banik et al. (2021) and Animal Kingdom Ng et al. (2022) datasets to train Control Net in a similar way as the original paper, which uses stable diffusion version 1.5. ... The diffusion model was pre-trained for 10,000 iterations using the Adam optimizer with a learning rate of 1e 3 and a batch size of 1. ... We use Instant-NGP Müller et al. (2022) as the Ne RF representation. ... This tool was developed using THREE.js and can be run on any web browser. ... We use the recently released GPT-4o API of Open AI with max_tokens as 4096 and temperature as 0.9.
Experiment Setup Yes The model was trained over 229k iterations with a batch size of 12, a constant learning rate of 1e 5, on a single Nvidia RTX 6000. ... The diffusion model was pre-trained for 10,000 iterations using the Adam optimizer with a learning rate of 1e 3 and a batch size of 1. During training, the camera positions were randomly sampled in spherical coordinates, where the radius, azimuth, and polar angle of camera position were sampled from [1.0, 2.0], [0, 360], and [60, 120]. ... We set tmax to be 0.98, tmin to be 0.4. Similar to the previous stage, we trained the model over total_iters = 10, 000 using the same settings for the optimizer. Using cosine annealing, we reduced the controlscale from an initial value of 1 to a final value of 0.2, while updating guidancescale linearly from guidancemin = 50 to guidancemax = 100. ... λRGB was set to 0.01.