Language and Visual Entity Relationship Graph for Agent Navigation
Authors: Yicong Hong, Cristian Rodriguez, Yuankai Qi, Qi Wu, Stephen Gould
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that by taking advantage of the relationships we are able to improve over state-of-the-art. On the Room-to-Room (R2R) benchmark, our method achieves the new best performance on the test unseen split with success rate weighted by path length (SPL) of 52%. On the Room-for-Room (R4R) dataset, our method significantly improves the previous best from 13% to 34% on the success weighted by normalized dynamic time warping (SDTW). |
| Researcher Affiliation | Academia | Yicong Hong1 Cristian Rodriguez-Opazo1 Yuankai Qi2 Qi Wu2 Stephen Gould1 1Australian National University 2The University of Adelaide 1,2Australian Centre for Robotic Vision {yicong.hong, cristian.rodriguez, stephen.gould}@anu.edu.au qykshr@gmail.com, qi.wu01@adelaide.edu.au |
| Pseudocode | No | The paper describes the proposed model and algorithms using text and mathematical equations, but it does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/Yicong Hong/Entity-Graph-VLN. |
| Open Datasets | Yes | The Room-to-Room (R2R) dataset [3] consists of 10,567 panoramic views in 90 realworld environments as well as 7,189 trajectories where each is described by three natural language instructions. [...] To show the generalizability of our proposed agent, we also evaluate the agent s performance on the Room-for-Room (R4R) dataset [17], an extended version of R2R, with longer instructions and trajectories. |
| Dataset Splits | Yes | The dataset is split into train, validation seen, validation unseen and test unseen sets. |
| Hardware Specification | No | The paper mentions using image representations from ResNet-152 and Faster RCNN for features, but it does not specify the hardware (e.g., GPU models, CPU, memory) used for running the experiments or training the models. |
| Software Dependencies | No | The paper mentions using ResNet-152, Faster RCNN, GloVe, and Stanford NLP Parser, but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | No | The paper describes a two-stage training process and mentions using a single-step update for efficiency, but it lacks specific hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) in the main text. It refers to an Appendix for more details, which is not provided. |