Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation
Authors: Junyoung Seo, Wooseok Jang, Min-Seop Kwak, Hyeonsu Kim, Jaehoon Ko, Junho Kim, Jin-Hwa Kim, Jiyoung Lee, Seungryong Kim
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally demonstrate how our method notably enhances the 3D consistency of generated scenes compared to previous baselines, achieving state-of-the-art performance in geometric robustness and fidelity. The project page is available at https://ku-cvlab.github.io/3DFuse/. We extensively demonstrate the effectiveness of our framework with qualitative analyses and ablation studies. Moreover, we introduce an innovative metric for quantitatively assessing the 3D consistency of the generated scenes. |
| Researcher Affiliation | Collaboration | Junyoung Seo 1 Wooseok Jang 1 Min-Seop Kwak 1 Hyeonsu Kim1 Jaehoon Ko1 Junho Kim2 Jin-Hwa Kim 2,3 Jiyoung Lee 2 Seungryong Kim 1 1Korea University 2NAVER AI Lab 3AI Institute of Seoul National University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The project page is available at https://ku-cvlab.github.io/3DFuse/ |
| Open Datasets | Yes | We use the Co3D (Reizenstein et al., 2021) dataset to train our sparse depth injector. The dataset is comprised of 50 categories and 5,625 annotated point cloud videos. |
| Dataset Splits | No | The paper describes the training data for the sparse depth injector and the evaluation setup for the main model (user study), but does not provide explicit train/validation/test splits or percentages for the datasets used to train or evaluate the primary model. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a "Stable Diffusion model based on LDM" and specifies a "Stable Diffusion v1.5 checkpoint". It also references libraries like "Py Torch3D (Ravi et al., 2020)", and models like "Control Net (Zhang & Agrawala, 2023)", "Mi Da S (Ranftl et al., 2020)", "Karlo (Donghoon Lee et al., 2022) based on un CLIP (Ramesh et al., 2022)". While specific versions are given for some models/checkpoints, a comprehensive list of all required software dependencies with their specific version numbers (e.g., Python, PyTorch, CUDA versions) is not provided to ensure full reproducibility of the software environment. |
| Experiment Setup | Yes | We use scaling factors of 0.3 and 1.0 on the features passing through the Lo RA layers and the sparse depth injector, respectively. To ease of training, we start from the weights (Zhang & Agrawala, 2023) trained on the text-to-image pairs with Mi Da S depth and fine-tune the model using the sparse depth maps synthesized from the Co3D dataset for 2 additional epochs. |