Evolving Graphical Planner: Contextual Global Planning for Vision-and-Language Navigation

Authors: Zhiwei Deng, Karthik Narasimhan, Olga Russakovsky

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on a challenging Vision-and-Language Navigation (VLN) task with photorealistic images, and achieve superior performance compared to previous navigation architectures. For instance, we achieve a 53% success rate on the test split of the Room-to-Room navigation task [1] through pure imitation learning, outperforming previous navigation architectures by up to 5%.
Researcher Affiliation Academia Zhiwei Deng Karthik Narasimhan Olga Russakovsky Department of Computer Science Princeton University {zhiweid, karthikn, olgarus}@cs.princeton.edu
Pseudocode No The paper describes the model architecture and processes in text and diagrams, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is available at https://github.com/Lucas2012/ Evolving Graphical Planner.
Open Datasets Yes We evaluate our method on the standard benchmark datasets for Vision-and Language Navigation (VLN). (1) Room-to-Room (R2R) benchmark [1]... (2) Room-for-Room (R4R) [18]...
Dataset Splits Yes Table 1: Dataset statistics. R2R Train 14,039 Val:seen 1,021 Val:unseen 2,349 Test 4,173. R4R Train 233,532 Val:seen 1,035 Val:unseen 45,234 -
Hardware Specification No The paper does not provide specific details about the hardware used for experiments (e.g., GPU models, CPU specifications, or cloud instance types).
Software Dependencies No The paper mentions using an LSTM and the Adam optimizer but does not specify version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup Yes In the Evolving Graphical Planner, we use 256 dimensions as the graph embedding size for both the full graph and the proxy graph. The propagation model uses three iterations of message passing operations. For every expansion step, the default setting adds all the possible navigable locations into the graph (top-K is set to 16, the maximum number of navigable location in both datasets). The model is trained jointly, using Adam [49] with 1e-4 as the default learning rate.