DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation

Authors: Jiaxiang Tang, Jiawei Ren, Hang Zhou, Ziwei Liu, Gang Zeng

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the superior efficiency and competitive generation quality of our proposed approach.
Researcher Affiliation Collaboration Jiaxiang Tang1 , Jiawei Ren2, Hang Zhou3, Ziwei Liu2, Gang Zeng1 1National Key Laboratory of General AI, School of IST, Peking University 2S-Lab, Nanyang Technological University 3Baidu Inc.
Pseudocode No The paper describes algorithms in text but does not include any figure, block, or section explicitly labeled "Pseudocode" or "Algorithm".
Open Source Code No The paper includes the URL https://dreamgaussian.github.io/ but does not explicitly state that the source code for the methodology is released there or provide a direct link to a code repository (e.g., github.com/username/repo).
Open Datasets Yes A dataset of 30 images collected from previous works (Melas-Kyriazi et al., 2023; Liu et al., 2023a; Tang et al., 2023b; Liu et al., 2023c) and Internet covering various objects is used.
Dataset Splits No The paper describes using a dataset collected from previous works and the Internet for evaluation, but does not specify exact training/validation/test dataset splits (e.g., percentages or absolute counts) for model training.
Hardware Specification Yes All experiments are performed and measured with an NVIDIA V100 (16GB) GPU, while our method requires less than 8 GB GPU memory.
Software Dependencies No The paper mentions tools like NVdiffrast (Laine et al., 2020) and models like CLIP-ViT-big G-14-laion2B-39B-b160k, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes We train 500 steps for the first stage and 50 steps for the second stage. The 3D Gaussians are initialized to 0.1 opacity and grey color inside a sphere of radius 0.5. The rendering resolution is increased from 64 to 512 for Gaussian splatting, and randomly sampled from 128 to 1024 for mesh. The loss weights for RGB and transperency are linearly increased from 0 to 10^4 and 10^3 during training. We sample random camera poses at a fixed radius of 2 for image-to-3D and 2.5 for text-to-3D, y-axis FOV of 49 degree, with the azimuth in [ -180, 180] degree and elevation in [ -30, 30] degree. The background is rendered randomly as white or black for Gaussian splatting. For image-to-3D task, the two stages each take around 1 minute. We preprocess the input image by background removal (Qin et al., 2020) and recentering of the foreground object. The 3D Gaussians are initialized with 5000 random particles and densified for each 100 steps. For text-to-3D task, due to the larger resolution of 512 x 512 used by Stable Diffusion (Rombach et al., 2022) model, each stage takes around 2 minutes to finish. We initialize the 3D Gaussians with 1000 random particles and densify them for each 50 steps. For mesh extraction, we use an empirical threshold of 1 for Marching Cubes.