Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation
Authors: Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on a challenging Vision-and-Language Navigation (VLN) task with photorealistic images, and achieve superior performance compared to previous navigation architectures. For instance, we achieve a 53% success rate on the test split of the Room-to-Room navigation task [1] through pure imitation learning, outperforming previous navigation architectures by up to 5%. |
| Researcher Affiliation | Academia | Zhiwei Deng Karthik Narasimhan Olga Russakovsky Department of Computer Science Princeton University {zhiweid, karthikn, olgarus}@cs.princeton.edu |
| Pseudocode | No | The paper describes the model architecture and processes in text and diagrams, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our implementation is available at https://github.com/Lucas2012/ Evolving Graphical Planner. |
| Open Datasets | Yes | We evaluate our method on the standard benchmark datasets for Vision-and Language Navigation (VLN). (1) Room-to-Room (R2R) benchmark [1]... (2) Room-for-Room (R4R) [18]... |
| Dataset Splits | Yes | Table 1: Dataset statistics. R2R Train 14,039 Val:seen 1,021 Val:unseen 2,349 Test 4,173. R4R Train 233,532 Val:seen 1,035 Val:unseen 45,234 - |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU specifications, or cloud instance types). |
| Software Dependencies | No | The paper mentions using an LSTM and the Adam optimizer but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | In the Evolving Graphical Planner, we use 256 dimensions as the graph embedding size for both the full graph and the proxy graph. The propagation model uses three iterations of message passing operations. For every expansion step, the default setting adds all the possible navigable locations into the graph (top-K is set to 16, the maximum number of navigable location in both datasets). The model is trained jointly, using Adam [49] with 1e-4 as the default learning rate. |