Vision Language Navigation with Knowledge-driven Environmental Dreamer

Authors: Fengda Zhu, Vincent CS Lee, Xiaojun Chang, Xiaodan Liang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The navigation agent with the KED method outperforms the state-of-the-art methods on various VLN benchmarks, such as R2R, R4R, and Rx R. Both qualitative and quantitative experiments prove that our proposed KED method is able to high-quality augmentation data with texture consistency and structure consistency.
Researcher Affiliation Academia Fengda Zhu1 , Vincent CS Lee1 , Xiaojun Chang2 and Xiaodan Liang3,4 1Monash University 2University of Technology Sydney 3Sun Yat-sen University 4Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Pseudocode Yes Algorithm 1 Selecting key vertexes
Open Source Code No The paper does not provide any links to open-source code or explicitly state that the code for the methodology is available.
Open Datasets Yes The most widely used VLN dataset, Room-to-room dataset [Anderson et al., 2018b], contains only 22K instruction-path pairs from 90 house scenes. We evaluate our navigation agent on three VLN benchmarks: Room-to-Room (R2R), Room-for-Room (R4R), and Room-Across-Room (Rx R).
Dataset Splits Yes Table 1: Comparison of agent performance on R2R in single-run setting. * reproduced results in our environment. R2R Validation Seen R2R Validation Unseen R2R Test Unseen
Hardware Specification No The paper mentions 'Due to the limit of computation resources' but does not specify any hardware details like GPU/CPU models or memory.
Software Dependencies No The paper mentions models like 'CLIP' and 'BERT' but does not provide specific version numbers for any software dependencies.
Experiment Setup No The paper describes the general learning strategy (Imitation Learning, Reinforcement Learning, loss functions), but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs, or specific lambda values for loss weights).