SGAM: Building a Virtual 3D World through Simultaneous Generation and Mapping

Authors: Yuan Shen, Wei-Chiu Ma, Shenlong Wang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on CLEVER and Google Earth datasets demonstrates ours can generate consistent, realistic, and geometrically-plausible scenes that compare favorably to existing view synthesis methods. Our model can be trained from RGB-D sequences without having access to the complete 3D scene structure. We validate the efficacy of SGAM on two large-scale 3D scene datasets, Clevr-infinite and Google Earth. Both benchmarks produce substantially larger 3D scenes than existing datasets. This allows us to benchmark large-scale 3D scene generation. We evaluate the standard metrics for perceptual image quality, such as PSNR, SSIM (75), LPIPS (82), IS (63), and FID (27). We further benchmark the realism of the generated 3D scenes in terms of Jensen-Shannon divergence and maximum-mean discrepancy (22). Experimental results suggest that 1) our method produces more realistic results than existing single-image novel view synthesis methods; 2) our approach generates a more meaningful 3D world than the prior perpetual view generation algorithms.
Researcher Affiliation Academia Yuan Shen1 Wei-Chiu Ma2 Shenlong Wang1 1University of Illinois at Urbana-Champaign 2Massachusetts Institute of Technology
Pseudocode No The paper describes the system architecture and process flow through textual descriptions and diagrams (e.g., Figure 2) but does not include any pseudocode or algorithm blocks.
Open Source Code No Our project page is available at https://yshen47.github.io/sgam/. (3a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We plan to release the code after acceptance.
Open Datasets Yes We validate the efficacy of SGAM on two large-scale 3D scene datasets, Clevr-infinite and Google Earth. We thus exploit Blender and the assets from CLEVR (31) to render an extremely large-scale synthetic benchmark, which we called CLEVR-Infinite.
Dataset Splits Yes Table 1: Comparison study of ours and baseline methods on the CLEVR-Infinite validation set.
Hardware Specification Yes We report the runtime of a single prediction step, benchmarked on Nvidia Quadro 8000. We implement SGAM in Py Torch, and train with a batch size of 8 on 2 Nvidia A40 GPU until convergence.
Software Dependencies No We implement SGAM in Py Torch, and train with a batch size of 8 on 2 Nvidia A40 GPU until convergence. We use Adam to train our entire network. The paper mentions software tools like PyTorch and Adam but does not specify their version numbers.
Experiment Setup Yes We set the vocabulary size to 16384. We implement SGAM in Py Torch, and train with a batch size of 8 on 2 Nvidia A40 GPU until convergence. For other baselines, we maximize batch size to fit GPU memory, and train until convergence. GFVS-implicit, GFVS-explicit and SGAM share the same learning rate at 0.0625 for the second-stage trainig. To stabilize training, we first train the encoder-decoder network without quantization for the first 30k iterations and then add it back and train the entire network for another 60k iterations using the straight-through gradient estimator trick (3) to back-propagate gradient.