reproducibilityindex.ai

Vision-Language Navigation with Energy-Based Policy

Authors: Rui Liu, Wenguan Wang, Yi Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that ENP outperforms the counterparts, e.g., 2% SR and 1% SPL gains over R2R [1], 1.22% RGS on REVERIE [30], 2% SR on R2R-CE [31], and 1.07% NDTW on Rx R-CE [32], respectively. We examine the efficacy of ENP for VLN in discrete environments ( 4.1), and more challenging continuous environments ( 4.2). Then we provide diagnostic analysis on core model design ( 4.3).
Researcher Affiliation	Academia	Rui Liu1,2 Wenguan Wang2 Yi Yang1,2, 1The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 2College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Pseudocode	Yes	Algorithm 1 Energy-based Navigation Policy (ENP) Learning Algorithm
Open Source Code	No	As we promised, the code will be released upon the publication of our paper.
Open Datasets	Yes	R2R [1] contains 7, 189 shortest-path trajectories captured from 90 real-world building-scale scenes [87]. It consists of 22K step-by-step navigation instructions. REVERIE [30] contains 21, 702 high-level instructions... Both R2R [1] and REVERIE [30] are built upon Matterport3D Simulator [1]. Datasets. R2R-CE [31] and Rx R-CE [32] are more practical yet challenging...
Dataset Splits	Yes	All these datasets are devided into train, val seen, val unseen, and test unseen splits, which mainly focus on the generalization capability in unseen environments.
Hardware Specification	Yes	All experiments are conducted on a single NVIDIA 4090 GPU with 24GB memory in Py Torch. Testing is conducted on the same machine.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number for it or any other software dependencies, such as specific libraries or optimizers with their versions.
Experiment Setup	Yes	For fairness, the hyper-parameters (e.g., batch size, optimizers, maximal iterations, learning rates) of these models are kept the original setup. For SGLD, ˆs0 is sampled from M with a probability of 95% (Eq. 9). In Env Drop [3] and VLN BERT [4], the number of SGLD iterations is set as I =15. Ipre =20 and Ift =5 of the pre-training and finetuning stages are used for DUET [6] and ETPNav [29]. ... The step size ϵ=1.5 and ξ N(0, 0.01) are set in ENP ( 4.3).