GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Authors: Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong, Jinlin He, Zhiwei Lin, Yongtao Wang, Deqing Sun, Ming-Hsuan Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that GALA3D is a user-friendly, end-to-end framework for stateof-the-art scene-level 3D content generation and controllable editing while ensuring the high fidelity of object-level entities within the scene. |
| Researcher Affiliation | Collaboration | 1Wangxuan Institute of Computer Technology, Peking University 2National Key Laboratory for Multimedia Information Processing 3Google Deep Mind 4University of California, Merced. |
| Pseudocode | No | The paper describes the proposed method in detail in Section 3, including mathematical formulations, but it does not present any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source codes and models will be available at gala3d.github.io. |
| Open Datasets | No | The paper mentions using 'text prompts' for evaluation and comparison and refers to 'zero-shot text-to-3D generation', but it does not provide concrete access information (e.g., links, DOIs, citations with authors/year) for specific publicly available or open datasets used for training or evaluation. |
| Dataset Splits | No | The paper mentions evaluating performance based on 'text prompts containing varying numbers of objects' and using CLIP Score. However, it does not specify any training/validation/test dataset splits, percentages, or sample counts, nor does it refer to predefined splits from established benchmarks. |
| Hardware Specification | Yes | All the experiments are carried out on a single A800 with 80 GB memory. |
| Software Dependencies | No | We utilize MVDream (Shi et al., 2023) as the multi-view diffusion prior... We use Control Net (Zhang et al., 2023a) for compositional optimization... The paper mentions specific software tools used but does not provide their version numbers (e.g., 'MVDream' without a version). |
| Experiment Setup | Yes | We utilize MVDream (Shi et al., 2023) as the multi-view diffusion model, with a guidance scale of 50. The guidance scale of Control Net is set to 100... For the 3DGS, the learning rates of opacity and position are 5e-2 and 1.6e-4. The color of 3D Gaussians is represented by the spherical harmonic coefficient, with the degree set to 0 and the learning rate set to 5e-3. The covariance of the 3D Gaussians is converted into scaling and rotation for optimization, with learning rates of 5e-3 and 1e-3, respectively. We set coefficients β1, β2, β3, β4 as β1 = 1, β2 = 1e3, β3 = 1e-1, β4 = 1e-1, and β5 = 1e3 to balance the magnitude of the losses. For each instance, we initialize the 3D Gaussians with 100,000 particles... The sampling radius of the camera is set to the scene range in the spherical coordinate system, while horizontal angles are uniformly sampled at 360 degrees. |