Chasing Ghosts: Instruction Following as Bayesian State Tracking
Authors: Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that our approach outperforms a strong Ling UNet [2] baseline when predicting the goal location on the map. On the full VLN task, i.e., navigating to the goal location, our approach achieves promising results with less reliance on navigation constraints. |
| Researcher Affiliation | Collaboration | 1Georgia Institute of Technology, 2Facebook AI Research, 3Oregon State University {peter.anderson, ayshrv, parikh, dbatra}@gatech.edu leestef@oregonState.edu |
| Pseudocode | No | The paper describes the algorithms and models in prose and mathematical equations but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Py Torch code will be released to replicate all experiments.3 https://github.com/batra-mlp-lab/vln-chasing-ghosts |
| Open Datasets | Yes | R2R instruction dataset. We evaluate using the Room-to-Room (R2R) dataset for Vision-and Language Navigation (VLN) [1]. The dataset consists of 22K open-vocabulary, crowd-sourced navigation instructions with an average length of 29 words. Each instruction corresponds to a 5 24m trajectory in the Matterport3D dataset, traversing 5 7 viewpoint transitions. |
| Dataset Splits | Yes | Instructions are divided into splits for training, validation and testing. The validation set is further split into two components: val-seen, where instructions and trajectories are situated in environments seen during training, and val-unseen containing instructions situated in environments that are not seen during training. |
| Hardware Specification | No | The paper mentions extending the Matterport3D simulator and discusses frame rates subject to 'GPU performance and CPU-GPU memory bandwith', and also states 'We also use a less powerful CNN (Res Net-34 vs. Res Net-152 in prior work)'. However, it does not specify any exact models of GPUs, CPUs, or other specific hardware components used for training or inference. |
| Software Dependencies | No | The paper mentions 'Py Torch code will be released' and 'Res Net-34'. However, it does not provide specific version numbers for PyTorch or any other software dependencies needed for reproducibility. |
| Experiment Setup | No | The paper states 'Training data for the model consists of instruction-trajectory pairs (X, s 1:T ). In all experiments we train the filter using supervised learning by minimizing the KL-divergence between the predicted belief b1:T and the true trajectory from the start to the goal s 1:T , backpropagating gradients through the previous belief bt 1 at each step.' It also mentions 'The policy is trained with cross-entropy loss to maximize the likelihood of the ground-truth target action'. While it describes the losses and training approach, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It only vaguely states 'We provide further implementation details in the supplementary material'. |