IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
Authors: Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a comprehensive set of experiments that demonstrate the effectiveness of our method over baseline approaches. We provide comprehensive experimental results, both qualitative and quantitative, to validate the effectiveness of our methods. Our empirical findings show that our proposed method significantly improves the baseline models in terms of texture detail, geometry, and fidelity between text prompts and the resulting 3D objects. |
| Researcher Affiliation | Collaboration | Yiwen Chen*1,2, Chi Zhang*3, Xiaofeng Yang2, Zhongang Cai4, Gang Yu3, Lei Yang4, Guosheng Lin 1,2 1S-Lab, Nanyang Technological University 2School of Computer Science and Engineering, Nanyang Technological University 3Tencent PCG, China 4Sense Time Research |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to the open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating its own dataset (D') by rendering a coarse 3D model and employing image-to-image pipelines, but it does not state that this generated dataset is publicly available or provide access information for it. |
| Dataset Splits | No | The paper mentions training steps and convergence criteria but does not specify exact training, validation, and test dataset splits by percentage or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU models, CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Control Net and Stable Diffusion models, but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, specific library versions). |
| Experiment Setup | Yes | The learning rates for both the Ne RF and the discriminator are set to 1e-3 and 2e-3, respectively. For our baseline method, complex prompts such as avatars typically require approximately 20k steps to achieve complete convergence, while simpler prompts like flowers converge around 15k steps. To ensure a fair comparison, we train the baseline method for 25K steps (1.5 to 2.5 GPU hours) to guarantee the full convergence of the 3D model. As for our method, we resume training from 5k to 15k steps of the baseline method, varying based on the complexity of the prompt. |