reproducibilityindex.ai

MVGamba: Unify 3D Content Generation as State Space Sequence Modeling

Authors: Xuanyu Yi, Zike Wu, Qiuhong Shen, Qingshan Xu, Pan Zhou, Joo-Hwee Lim, Shuicheng Yan, Xinchao Wang, Hanwang Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only 0.1 of the model size. We conducted comprehensive qualitative and quantitative experiments to verify the efficacy of our proposed MVGamba.
Researcher Affiliation	Collaboration	1Nanyang Technological University 2National University of Singapore 3University of British Columbia 4Singapore Management University 5Institute for Infocomm Research 6Skywork AI
Pseudocode	Yes	Figure 14: The pseudo code for Gaussian parameter constraint. We provide the detailed pseudo code of Gaussian parameterization in Figure 14 for better reproduciblility.
Open Source Code	Yes	The codes are available at https://github.com/Skywork AI/MVGamba.
Open Datasets	Yes	Training dataset. We obtain the multi-view images from Objaverse [7] for MVGamba pre-training. We use well-adopted PSNR, SSIM and LPIPS for quantitative measurement in the GSO [76] dataset following [18].
Dataset Splits	No	The paper specifies training with 'input views' and 'supervision' views ('another random set of 6 views as supervision'), and then evaluates on 'test views' and datasets like GSO. However, it does not explicitly define a separate 'validation' dataset split by percentage or sample count, which is typically used for hyperparameter tuning or early stopping.
Hardware Specification	Yes	MVGamba is trained on 32 NVIDIA A100 (80G) with batch size 512 for about 2 days. This process...completes in less than 5 seconds (4.5 seconds for multi-view image generation and 0.03 second for predicting Gaussians and real-time rendering) on a single NVIDIA A800 (80G) GPU, making it well-suited for online deployment scenarios.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer' and 'mixed-precision training with BF16 data type' and refers to the 'Open3D' library for TSDF fusion, but it does not provide specific version numbers for programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or other key software dependencies.
Experiment Setup	Yes	MVGamba is trained on 32 NVIDIA A100 (80G) with batch size 512 for about 2 days. We adopt gradient checkpointing and mixed-precision training with BF16 data type to ensure efficient training and inference. We use the Adam W optimizer with learning rate 1 10 3 and weight decay 0.05, following a linear learning rate warm-up for 15 epochs with cosine decay to 1 10 5. The output Gaussians are rendered at 512 512 resolution for mean square error loss and resized to 256 256 for LPIPS loss for memory efficiency. The trade-off coefficients that balancing each loss were set as λmask = 1.0, λLPIPS = 0.6 and λreg = 0.001. We also follow the common practice [19] to clip the gradient with a maximum norm of 1.0. The detail of MVGamba model configuration is included in Appendix D (Table 5).