Generative View Synthesis: From Single-view Semantics to Novel-view Images

Authors: Tewodros Amberbir Habtegebrial, Varun Jampani, Orazio Gallo, Didier Stricker

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform extensive experimental analysis on three different multi-view datasets: CARLA [14], Cityscapes [12], and Virtual-KITTI-2 [3]. We show both qualitatively and quantitatively that our approach, which compares favorably with strong baseline techniques, produces novel-view images that are geometrically and semantically consistent.
Researcher Affiliation Collaboration Tewodros Habtegebrial1,4 Varun Jampani2 Orazio Gallo3 Didier Stricker1,4 1TU Kaiserslautern 2Google Research 3NVIDIA 4DFKI
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes For code and additional results, visit the project page at https://gvsnet.github.io
Open Datasets Yes We perform experiments on three different datasets: CARLA [14], Virtual-KITTI-2 [3] and Cityscapes [12].
Dataset Splits No The paper mentions training networks and evaluation metrics but does not explicitly provide specific training/validation/test split percentages or sample counts for the datasets used.
Hardware Specification Yes The entire network training does not fit on NVIDIA GTX-2080-Ti GPUs, which is what we use for training.
Software Dependencies No The paper states: "We implemented our model in Py Torch [26] and use the Adam [22] optimizer for training." While PyTorch is mentioned, a specific version number is not provided, nor are versions for any other software libraries.
Experiment Setup Yes For our experiments, we used k = 3 lifted semantics layers, m = 32 MPI planes, and f = 20 appearance features per pixel. We implemented our model in Py Torch [26] and use the Adam [22] optimizer for training. In all of our experiments we use images at a resolution of 256 x 256 pixels. We train GVSNet in two stages. In the first stage, we pre-train SUN with the target segmentation and depth losses. In the second stage, we train LTN and ADN with the target color loss, while keeping the SUN fixed.