Vision-Language Navigation with Energy-Based Policy
Authors: Rui Liu, Wenguan Wang, Yi Yang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that ENP outperforms the counterparts, e.g., 2% SR and 1% SPL gains over R2R [1], 1.22% RGS on REVERIE [30], 2% SR on R2R-CE [31], and 1.07% NDTW on Rx R-CE [32], respectively. We examine the efficacy of ENP for VLN in discrete environments ( 4.1), and more challenging continuous environments ( 4.2). Then we provide diagnostic analysis on core model design ( 4.3). |
| Researcher Affiliation | Academia | Rui Liu1,2 Wenguan Wang2 Yi Yang1,2, 1The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 2College of Computer Science and Technology, Zhejiang University, Hangzhou, China |
| Pseudocode | Yes | Algorithm 1 Energy-based Navigation Policy (ENP) Learning Algorithm |
| Open Source Code | No | As we promised, the code will be released upon the publication of our paper. |
| Open Datasets | Yes | R2R [1] contains 7, 189 shortest-path trajectories captured from 90 real-world building-scale scenes [87]. It consists of 22K step-by-step navigation instructions. REVERIE [30] contains 21, 702 high-level instructions... Both R2R [1] and REVERIE [30] are built upon Matterport3D Simulator [1]. Datasets. R2R-CE [31] and Rx R-CE [32] are more practical yet challenging... |
| Dataset Splits | Yes | All these datasets are devided into train, val seen, val unseen, and test unseen splits, which mainly focus on the generalization capability in unseen environments. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA 4090 GPU with 24GB memory in Py Torch. Testing is conducted on the same machine. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number for it or any other software dependencies, such as specific libraries or optimizers with their versions. |
| Experiment Setup | Yes | For fairness, the hyper-parameters (e.g., batch size, optimizers, maximal iterations, learning rates) of these models are kept the original setup. For SGLD, ˆs0 is sampled from M with a probability of 95% (Eq. 9). In Env Drop [3] and VLN BERT [4], the number of SGLD iterations is set as I =15. Ipre =20 and Ift =5 of the pre-training and finetuning stages are used for DUET [6] and ETPNav [29]. ... The step size ϵ=1.5 and ξ N(0, 0.01) are set in ENP ( 4.3). |