PanoDiffusion: 360-degree Panorama Outpainting via Diffusion

Authors: Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results show that our Pano Diffusion not only significantly outperforms state-of-the-art methods on RGB-D panorama outpainting by producing diverse well-structured results for different types of masks, but can also synthesize high-quality depth panoramas to provide realistic 3D indoor models. 4 EXPERIMENTS
Researcher Affiliation Academia Tianhao Wu1 , Chuanxia Zheng2 & Tat-Jen Cham1 S-Lab, 1Nanyang Technological University tianhao001@e.ntu.edu.sg, astjcham@ntu.edu.sg 2University of Oxford cxzheng@robots.ox.ac.uk
Pseudocode No The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The supplementary materials are organized as follows: 1. A video is added to illuminate the work with more results. 2. The reproducible code is included. 3. An additional PDF for implementation, training, metrics details, as well as more quantitative and qualitative results.
Open Datasets Yes Dataset. We estimated our model on the Structured3D dataset (Zheng et al., 2020), which provides 360 indoor RGB-D data following equirectangular projection with a 512 1024 resolution.
Dataset Splits Yes We split the dataset into 16930 train, 2116 validation, and 2117 test instances.
Hardware Specification No The paper mentions 'on the same devices' for training time comparison but does not provide specific hardware details such as GPU/CPU models or memory.
Software Dependencies No The paper does not provide specific version numbers for software dependencies. It mentions PyTorch and CUDA indirectly through reference to official implementations or pre-trained models, but without versions.
Experiment Setup Yes For training G, we use a weighted sum of the pixel-wise L1 loss and adversarial loss. The pixel-wise L1 loss is denoted as Lpixel, measuring the difference between the GT and the output panorama. ... Here the value of λ is set to 20 during the training. The training of VAEs is exactly the same as in (Rombach et al., 2022) with downsampling factor f=4.