reproducibilityindex.ai

SGAM: Building a Virtual 3D World through Simultaneous Generation and Mapping

Authors: Yuan Shen, Wei-Chiu Ma, Shenlong Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on CLEVER and Google Earth datasets demonstrates ours can generate consistent, realistic, and geometrically-plausible scenes that compare favorably to existing view synthesis methods. Our model can be trained from RGB-D sequences without having access to the complete 3D scene structure. We validate the efﬁcacy of SGAM on two large-scale 3D scene datasets, Clevr-inﬁnite and Google Earth. Both benchmarks produce substantially larger 3D scenes than existing datasets. This allows us to benchmark large-scale 3D scene generation. We evaluate the standard metrics for perceptual image quality, such as PSNR, SSIM (75), LPIPS (82), IS (63), and FID (27). We further benchmark the realism of the generated 3D scenes in terms of Jensen-Shannon divergence and maximum-mean discrepancy (22). Experimental results suggest that 1) our method produces more realistic results than existing single-image novel view synthesis methods; 2) our approach generates a more meaningful 3D world than the prior perpetual view generation algorithms.
Researcher Affiliation	Academia	Yuan Shen1 Wei-Chiu Ma2 Shenlong Wang1 1University of Illinois at Urbana-Champaign 2Massachusetts Institute of Technology
Pseudocode	No	The paper describes the system architecture and process flow through textual descriptions and diagrams (e.g., Figure 2) but does not include any pseudocode or algorithm blocks.
Open Source Code	No	Our project page is available at https://yshen47.github.io/sgam/. (3a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We plan to release the code after acceptance.
Open Datasets	Yes	We validate the efﬁcacy of SGAM on two large-scale 3D scene datasets, Clevr-inﬁnite and Google Earth. We thus exploit Blender and the assets from CLEVR (31) to render an extremely large-scale synthetic benchmark, which we called CLEVR-Inﬁnite.
Dataset Splits	Yes	Table 1: Comparison study of ours and baseline methods on the CLEVR-Inﬁnite validation set.
Hardware Specification	Yes	We report the runtime of a single prediction step, benchmarked on Nvidia Quadro 8000. We implement SGAM in Py Torch, and train with a batch size of 8 on 2 Nvidia A40 GPU until convergence.
Software Dependencies	No	We implement SGAM in Py Torch, and train with a batch size of 8 on 2 Nvidia A40 GPU until convergence. We use Adam to train our entire network. The paper mentions software tools like PyTorch and Adam but does not specify their version numbers.
Experiment Setup	Yes	We set the vocabulary size to 16384. We implement SGAM in Py Torch, and train with a batch size of 8 on 2 Nvidia A40 GPU until convergence. For other baselines, we maximize batch size to ﬁt GPU memory, and train until convergence. GFVS-implicit, GFVS-explicit and SGAM share the same learning rate at 0.0625 for the second-stage trainig. To stabilize training, we ﬁrst train the encoder-decoder network without quantization for the ﬁrst 30k iterations and then add it back and train the entire network for another 60k iterations using the straight-through gradient estimator trick (3) to back-propagate gradient.