Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Vision-Language Navigation with Energy-Based Policy
Authors: Rui Liu, Wenguan Wang, Yi Yang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate that ENP outperforms the counterparts, e.g., 2% SR and 1% SPL gains over R2R [1], 1.22% RGS on REVERIE [30], 2% SR on R2R-CE [31], and 1.07% NDTW on Rx R-CE [32], respectively. We examine the efficacy of ENP for VLN in discrete environments ( 4.1), and more challenging continuous environments ( 4.2). Then we provide diagnostic analysis on core model design ( 4.3). |
| Researcher Affiliation | Academia | Rui Liu1,2 Wenguan Wang2 Yi Yang1,2, 1The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou, China 2College of Computer Science and Technology, Zhejiang University, Hangzhou, China |
| Pseudocode | Yes | Algorithm 1 Energy-based Navigation Policy (ENP) Learning Algorithm |
| Open Source Code | No | As we promised, the code will be released upon the publication of our paper. |
| Open Datasets | Yes | R2R [1] contains 7, 189 shortest-path trajectories captured from 90 real-world building-scale scenes [87]. It consists of 22K step-by-step navigation instructions. REVERIE [30] contains 21, 702 high-level instructions... Both R2R [1] and REVERIE [30] are built upon Matterport3D Simulator [1]. Datasets. R2R-CE [31] and Rx R-CE [32] are more practical yet challenging... |
| Dataset Splits | Yes | All these datasets are devided into train, val seen, val unseen, and test unseen splits, which mainly focus on the generalization capability in unseen environments. |
| Hardware Specification | Yes | All experiments are conducted on a single NVIDIA 4090 GPU with 24GB memory in Py Torch. Testing is conducted on the same machine. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number for it or any other software dependencies, such as specific libraries or optimizers with their versions. |
| Experiment Setup | Yes | For fairness, the hyper-parameters (e.g., batch size, optimizers, maximal iterations, learning rates) of these models are kept the original setup. For SGLD, Λs0 is sampled from M with a probability of 95% (Eq. 9). In Env Drop [3] and VLN BERT [4], the number of SGLD iterations is set as I =15. Ipre =20 and Ift =5 of the pre-training and finetuning stages are used for DUET [6] and ETPNav [29]. ... The step size Ο΅=1.5 and ΞΎ N(0, 0.01) are set in ENP ( 4.3). |