PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation

Authors: Jialu Li, Mohit Bansal

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, learning with our PANOGEN environments achieves the new state-of-the-art on the Room-to-Room, Room-for Room, and CVDN datasets.
Researcher Affiliation Academia Jialu Li Mohit Bansal UNC Chapel Hill {jialuli, mbansal}@cs.unc.edu
Pseudocode No The paper describes the PANOGEN method using textual descriptions and figures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper provides a project website URL (https://pano-gen.github.io) but does not explicitly state that source code for the described methodology is released or provide a direct link to a code repository within the paper's text.
Open Datasets Yes We evaluate our agent on three datasets: Room-to-Room dataset (R2R) [2], Cooperative Vision-and-Dialog Navigation dataset (CVDN) [52], and Room-for-Room dataset (R4R) [21].
Dataset Splits Yes The training set contains 61 different room environments, while the unseen validation set and test set contains 11, and 18 room environments that are unseen during training.
Hardware Specification Yes It takes 2 days on 6 A100s to generate all the environments. ... We train the speaker for 4 epochs on one A6000 GPU... We train the model on one A6000 GPU.
Software Dependencies Yes We caption all the view images in the training environments in R2R dataset with BLIP-2-Flan T5-xx L. We utilize stable-diffusion-v2.1 base model to generate the single view based on caption only, and use stable-diffusion-v1.5-inpainting model to outpaint the unseen observation for the rotated views. ... We build our speaker model based on m PLUG-base.
Experiment Setup Yes We train the speaker for 4 epochs on one A6000 GPU with batch size 16 for two days. ... We pre-train the agent with batch size 64 for 150k iterations, and then fine-tune the agent with batch size 8 for 40k iterations.