reproducibilityindex.ai

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

Authors: Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang, Guosheng Lin

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a comprehensive set of experiments that demonstrate the effectiveness of our method over baseline approaches. We provide comprehensive experimental results, both qualitative and quantitative, to validate the effectiveness of our methods. Our empirical findings show that our proposed method significantly improves the baseline models in terms of texture detail, geometry, and fidelity between text prompts and the resulting 3D objects.
Researcher Affiliation	Collaboration	Yiwen Chen1,2, Chi Zhang3, Xiaofeng Yang2, Zhongang Cai4, Gang Yu3, Lei Yang4, Guosheng Lin 1,2 1S-Lab, Nanyang Technological University 2School of Computer Science and Engineering, Nanyang Technological University 3Tencent PCG, China 4Sense Time Research
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link to the open-source code for the described methodology.
Open Datasets	No	The paper describes generating its own dataset (D') by rendering a coarse 3D model and employing image-to-image pipelines, but it does not state that this generated dataset is publicly available or provide access information for it.
Dataset Splits	No	The paper mentions training steps and convergence criteria but does not specify exact training, validation, and test dataset splits by percentage or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions using Control Net and Stable Diffusion models, but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, specific library versions).
Experiment Setup	Yes	The learning rates for both the Ne RF and the discriminator are set to 1e-3 and 2e-3, respectively. For our baseline method, complex prompts such as avatars typically require approximately 20k steps to achieve complete convergence, while simpler prompts like flowers converge around 15k steps. To ensure a fair comparison, we train the baseline method for 25K steps (1.5 to 2.5 GPU hours) to guarantee the full convergence of the 3D model. As for our method, we resume training from 5k to 15k steps of the baseline method, varying based on the complexity of the prompt.