HORIZON: High-Resolution Semantically Controlled Panorama Synthesis

Authors: Kun Yan, Lei Ji, Chenfei Wu, Jian Liang, Ming Zhou, Nan Duan, Shuai Ma

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We rigorously evaluate our methodology on a diverse array of indoor and outdoor datasets, establishing its superiority over recent related work, in terms of both quantitative and qualitative performance metrics.
Researcher Affiliation Collaboration Kun Yan1, Lei Ji2, Chenfei Wu2, Jian Liang3, Ming Zhou4, Nan Duan2, Shuai Ma1 1SKLSDE Lab, Beihang University, 2Microsoft Reseach Asia, 3Peking University, 4Langboat Technology
Pseudocode No The paper describes its methods in text and uses diagrams (e.g., Figure 2) but does not include explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We evaluate our model on the high-resolution Street Learn dataset (Mirowski et al. 2019), which consists of Google Street View panoramas.
Dataset Splits No The Pittsburgh dataset containing 58k images, split into 52.2k for training and 5.8k for testing. The paper mentions training and testing splits, but does not explicitly state a validation split.
Hardware Specification Yes The experiments are conducted on 64 V100 GPUs, each with 32Gi B memory.
Software Dependencies No The paper mentions using components like a 'pretrained CLIP visual module' but does not provide specific version numbers for any software dependencies or libraries required for reproduction.
Experiment Setup Yes Every equirectangular projected panoramic image with a resolution of 768x1,536 is first divided into 3x6=18 RGB view patches, each with a resolution of 256x256. Then we train a VQGAN(Esser, Rombach, and Ommer 2021) on every view patch separately.