reproducibilityindex.ai

PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation

Authors: Jialu Li, Mohit Bansal

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, learning with our PANOGEN environments achieves the new state-of-the-art on the Room-to-Room, Room-for Room, and CVDN datasets.
Researcher Affiliation	Academia	Jialu Li Mohit Bansal UNC Chapel Hill {jialuli, mbansal}@cs.unc.edu
Pseudocode	No	The paper describes the PANOGEN method using textual descriptions and figures but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper provides a project website URL (https://pano-gen.github.io) but does not explicitly state that source code for the described methodology is released or provide a direct link to a code repository within the paper's text.
Open Datasets	Yes	We evaluate our agent on three datasets: Room-to-Room dataset (R2R) [2], Cooperative Vision-and-Dialog Navigation dataset (CVDN) [52], and Room-for-Room dataset (R4R) [21].
Dataset Splits	Yes	The training set contains 61 different room environments, while the unseen validation set and test set contains 11, and 18 room environments that are unseen during training.
Hardware Specification	Yes	It takes 2 days on 6 A100s to generate all the environments. ... We train the speaker for 4 epochs on one A6000 GPU... We train the model on one A6000 GPU.
Software Dependencies	Yes	We caption all the view images in the training environments in R2R dataset with BLIP-2-Flan T5-xx L. We utilize stable-diffusion-v2.1 base model to generate the single view based on caption only, and use stable-diffusion-v1.5-inpainting model to outpaint the unseen observation for the rotated views. ... We build our speaker model based on m PLUG-base.
Experiment Setup	Yes	We train the speaker for 4 epochs on one A6000 GPU with batch size 16 for two days. ... We pre-train the agent with batch size 64 for 150k iterations, and then fine-tune the agent with batch size 8 for 40k iterations.