reproducibilityindex.ai

Chasing Ghosts: Instruction Following as Bayesian State Tracking

Authors: Peter Anderson, Ayush Shrivastava, Devi Parikh, Dhruv Batra, Stefan Lee

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that our approach outperforms a strong Ling UNet [2] baseline when predicting the goal location on the map. On the full VLN task, i.e., navigating to the goal location, our approach achieves promising results with less reliance on navigation constraints.
Researcher Affiliation	Collaboration	1Georgia Institute of Technology, 2Facebook AI Research, 3Oregon State University {peter.anderson, ayshrv, parikh, dbatra}@gatech.edu leestef@oregonState.edu
Pseudocode	No	The paper describes the algorithms and models in prose and mathematical equations but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Py Torch code will be released to replicate all experiments.3 https://github.com/batra-mlp-lab/vln-chasing-ghosts
Open Datasets	Yes	R2R instruction dataset. We evaluate using the Room-to-Room (R2R) dataset for Vision-and Language Navigation (VLN) [1]. The dataset consists of 22K open-vocabulary, crowd-sourced navigation instructions with an average length of 29 words. Each instruction corresponds to a 5 24m trajectory in the Matterport3D dataset, traversing 5 7 viewpoint transitions.
Dataset Splits	Yes	Instructions are divided into splits for training, validation and testing. The validation set is further split into two components: val-seen, where instructions and trajectories are situated in environments seen during training, and val-unseen containing instructions situated in environments that are not seen during training.
Hardware Specification	No	The paper mentions extending the Matterport3D simulator and discusses frame rates subject to 'GPU performance and CPU-GPU memory bandwith', and also states 'We also use a less powerful CNN (Res Net-34 vs. Res Net-152 in prior work)'. However, it does not specify any exact models of GPUs, CPUs, or other specific hardware components used for training or inference.
Software Dependencies	No	The paper mentions 'Py Torch code will be released' and 'Res Net-34'. However, it does not provide specific version numbers for PyTorch or any other software dependencies needed for reproducibility.
Experiment Setup	No	The paper states 'Training data for the model consists of instruction-trajectory pairs (X, s 1:T ). In all experiments we train the ﬁlter using supervised learning by minimizing the KL-divergence between the predicted belief b1:T and the true trajectory from the start to the goal s 1:T , backpropagating gradients through the previous belief bt 1 at each step.' It also mentions 'The policy is trained with cross-entropy loss to maximize the likelihood of the ground-truth target action'. While it describes the losses and training approach, it does not provide specific hyperparameters (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It only vaguely states 'We provide further implementation details in the supplementary material'.