YouDream: Generating Anatomically Controllable Consistent Text-to-3D Animals
Authors: Sandeep Mishra, Oindrila Saha, Alan Bovik
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we compare YOUDREAM against various baselines and evaluate the effect of various components of our method. We show qualitative comparison with text-to-3D methods... We also compare with MVDream... We conduct a user study to quantitatively evaluate our method against these baselines. We also compute CLIP score following previous work Shi et al. (2023), shown in Sec. I. Additionally, we present ablations over the various modules that constitute YOUDREAM. |
| Researcher Affiliation | Academia | University of Texas at Austin, University of Massachusetts Amherst |
| Pseudocode | No | The paper describes its methodology in text and equations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Visualizations and code are available at https://youdream3d.github.io/. ... We open-source this setup along with our project code. |
| Open Datasets | Yes | We used annotated poses from the Aw A-pose Banik et al. (2021) and Animal Kingdom Ng et al. (2022) datasets to train Control Net in a similar way as the original paper, which uses stable diffusion version 1.5. |
| Dataset Splits | No | The paper specifies training details such as 'trained over 229k iterations' and 'pre-trained for 10,000 iterations', but it does not explicitly provide information on training/test/validation dataset splits or refer to a validation set. |
| Hardware Specification | Yes | The model was trained over 229k iterations with a batch size of 12, a constant learning rate of 1e 5, on a single Nvidia RTX 6000. ... The pre-training stage, if used, takes less than 12 minutes to complete, while the fine-tuning stage takes less than 40 minutes to complete on a single A100 40GB GPU. ... All the experiments pertaining to YOUDREAM and 3DFuse were run on Nvidia A100 40GB GPU. Few experiments for MVDream and all experiments of Hi FA required running on A100 80GB GPU, while all experiments for Fantasia3D were run on 3x A100 40GB GPUs. |
| Software Dependencies | Yes | We used annotated poses from the Aw A-pose Banik et al. (2021) and Animal Kingdom Ng et al. (2022) datasets to train Control Net in a similar way as the original paper, which uses stable diffusion version 1.5. ... The diffusion model was pre-trained for 10,000 iterations using the Adam optimizer with a learning rate of 1e 3 and a batch size of 1. ... We use Instant-NGP Müller et al. (2022) as the Ne RF representation. ... This tool was developed using THREE.js and can be run on any web browser. ... We use the recently released GPT-4o API of Open AI with max_tokens as 4096 and temperature as 0.9. |
| Experiment Setup | Yes | The model was trained over 229k iterations with a batch size of 12, a constant learning rate of 1e 5, on a single Nvidia RTX 6000. ... The diffusion model was pre-trained for 10,000 iterations using the Adam optimizer with a learning rate of 1e 3 and a batch size of 1. During training, the camera positions were randomly sampled in spherical coordinates, where the radius, azimuth, and polar angle of camera position were sampled from [1.0, 2.0], [0, 360], and [60, 120]. ... We set tmax to be 0.98, tmin to be 0.4. Similar to the previous stage, we trained the model over total_iters = 10, 000 using the same settings for the optimizer. Using cosine annealing, we reduced the controlscale from an initial value of 1 to a final value of 0.2, while updating guidancescale linearly from guidancemin = 50 to guidancemax = 100. ... λRGB was set to 0.01. |