reproducibilityindex.ai

Interactive Grounded Language Acquisition and Generalization in a 2D World

Authors: Haonan Yu, Haichao Zhang, Wei Xu

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our model on randomly generated XWORLD maps with random agent positions, on a population of over 1.6 million distinct sentences consisting of 119 object words, 8 color words, 9 spatial-relation words, and 50 grammatical words. Detailed analysis (Appendix A) of the trained model shows that the language is grounded in such a way that the words are capable to pick out referents in the environment. We specially test the generalization ability of the agent for handling zero-shot sentences. The average NAV success rates are 84.3% for ZS1 and 85.2% for ZS2 when the zero-shot portion is half, comparable to the rate of 90.5% in a normal language setting. The average QA accuracies are 97.8% for ZS1 and 97.7% for ZS2 when the zero-shot portion is half, almost as good as the accuracy of 99.7% in a normal language setting.
Researcher Affiliation	Collaboration	Haonan Yu1, Haichao Zhang1 & Wei Xu1,2 1Baidu Research, Sunnyvale USA 2National Engineering Laboratory for Deep Learning Technology and Applications, Beijing China {haonanyu,zhanghaichao,wei.xu}@baidu.com
Pseudocode	No	The paper describes its model and approach using mathematical equations and textual explanations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	We created a 2D maze-like world called XWORLD (Figure 1), as a testbed for interactive grounded language acquisition and generalization.1 https://github.com/PaddlePaddle/XWorld
Open Datasets	No	The paper describes the creation of a '2D maze-like world called XWORLD' and states it evaluates the model 'on randomly generated XWORLD maps with random agent positions, on a population of over 1.6 million distinct sentences'. While it provides a GitHub link for XWORLD, this link is to the environment's code that generates the data, not a direct link to a pre-existing, downloadable dataset used for training.
Dataset Splits	No	The paper extensively discusses training and testing, including 'training reward curves' and 'testing results'. However, it does not explicitly mention or describe a separate validation set or split for hyperparameter tuning or model selection.
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general aspects of the experimental setup and software.
Software Dependencies	No	The paper mentions using 'adagrad (Duchi et al., 2011)' for optimization and refers to various neural network components like 'RNN', 'CNN', 'LSTM', and 'GRU'. However, it does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used.
Experiment Setup	Yes	In the following experiments, we train all six approaches (four comparison methods, one ablation, and our model) with a small learning rate of 1 ˆ 10 5 and a batch size of 16, for a maximum of 200k minibatches. Additional training details are described in Appendix C. After training, we test each approach on 50k sessions. For NAV, we compute the average success rate of navigation where a success is deﬁned as reaching the correct location before the time out of a session. For QA, we compute the average accuracy in answering the questions.