Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation
Authors: Zhuqiang Lu, Kun Hu, Chaoyue Wang, Lei Bai, Zhiyong Wang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on two commonly used 360-degree image datasets for both indoor and outdoor settings demonstrate the state-of-the-art performance of our proposed method. |
| Researcher Affiliation | Collaboration | Zhuqiang Lu1, Kun Hu1,*, Chaoyue Wang2, Lei Bai3, Zhiyong Wang1 1The University of Sydney 2JD.com 3Shanghai AI Laboratory |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/zhuqiang Lu/AOG-NET-360. |
| Open Datasets | Yes | we evaluate our proposed method with the LAVAL indoor HDR dataset (Gardner et al. 2017) for the 360-degree indoor image generation setting...For the outdoor setting, we utilize the LAVAL outdoor HDR dataset (Zhang and Lalonde 2017) |
| Dataset Splits | No | we used the official training and testing split in our experiments, in which we have 1,921 training samples and 312 testing samples. For the outdoor setting, we randomly sample 170 images as the training split and 40 images for testing purpose. |
| Hardware Specification | Yes | All experiments were conducted on an Nvidia RTX 3090. |
| Software Dependencies | No | In our experiment, we adopted the pretrained Stable Diffusion generative prior for each autoregressive generation step. In addition, We utilized the visual encoder and the text encoder of Open CLIP (Cherti et al. 2023) for E360 and Etext, respectively. We utilized T2I-Adapter (Mou et al. 2023) as the architecture for NFo V guidance encoder ENFo V and omnigeometry guidance encoder Egeometry. |
| Experiment Setup | Yes | AOG-Net was trained using an Adam W optimizer (Loshchilov and Hutter 2019) with β1 = 0.9 and β2 = 0.999. It was trained for 240 epochs, with learning rate 1 × 10−4 and batch size 1. For inference, we leveraged DPM-Solver++ (Lu et al. 2023) as sampler with a step set to 25 and classifier-free-guidance (Ho and Salimans 2022) scale set to 2.5. |