Generative View Synthesis: From Single-view Semantics to Novel-view Images
Authors: Tewodros Amberbir Habtegebrial, Varun Jampani, Orazio Gallo, Didier Stricker
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experimental analysis on three different multi-view datasets: CARLA [14], Cityscapes [12], and Virtual-KITTI-2 [3]. We show both qualitatively and quantitatively that our approach, which compares favorably with strong baseline techniques, produces novel-view images that are geometrically and semantically consistent. |
| Researcher Affiliation | Collaboration | Tewodros Habtegebrial1,4 Varun Jampani2 Orazio Gallo3 Didier Stricker1,4 1TU Kaiserslautern 2Google Research 3NVIDIA 4DFKI |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | For code and additional results, visit the project page at https://gvsnet.github.io |
| Open Datasets | Yes | We perform experiments on three different datasets: CARLA [14], Virtual-KITTI-2 [3] and Cityscapes [12]. |
| Dataset Splits | No | The paper mentions training networks and evaluation metrics but does not explicitly provide specific training/validation/test split percentages or sample counts for the datasets used. |
| Hardware Specification | Yes | The entire network training does not fit on NVIDIA GTX-2080-Ti GPUs, which is what we use for training. |
| Software Dependencies | No | The paper states: "We implemented our model in Py Torch [26] and use the Adam [22] optimizer for training." While PyTorch is mentioned, a specific version number is not provided, nor are versions for any other software libraries. |
| Experiment Setup | Yes | For our experiments, we used k = 3 lifted semantics layers, m = 32 MPI planes, and f = 20 appearance features per pixel. We implemented our model in Py Torch [26] and use the Adam [22] optimizer for training. In all of our experiments we use images at a resolution of 256 x 256 pixels. We train GVSNet in two stages. In the first stage, we pre-train SUN with the target segmentation and depth losses. In the second stage, we train LTN and ADN with the target color loss, while keeping the SUN fixed. |