Learning to Follow Instructions in Text-Based Games
Authors: Mathieu Tuli, Andrew Li, Pashootan Vaezipoor, Toryn Klassen, Scott Sanner, Sheila McIlraith
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments that show that the performance of state-of-the-art text-based game agents is largely unaffected by the presence or absence of such instructions, and that these agents are typically unable to execute tasks to completion. To further study and address the task of instruction following, we equip RL agents with an internal structured representation of natural language instructions in the form of Linear Temporal Logic (LTL), a formal language that is increasingly used for temporally extended reward specification in RL. Our framework both supports and highlights the benefit of understanding the temporal semantics of instructions and in measuring progress towards achievement of such a temporally extended behaviour. Experiments with 500+ games in Text World demonstrate the superior performance of our approach. |
| Researcher Affiliation | Academia | Mathieu Tuli, Andrew C. Li, Pashootan Vaezipoor, Toryn Q. Klassen , Scott Sanner, Sheila A. Mc Ilraith University of Toronto, Toronto, Canada Vector Institute for Artificial Intelligence, Toronto, Canada Schwartz Reisman Institute for Technology and Society, Toronto, Canada {mathieutuli,andrewli,pashootan,toryn,sheila}@cs.toronto.edu ssanner@mie.utoronto.ca |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. Methods are described textually and with architectural diagrams. |
| Open Source Code | Yes | Our code for the experiments can be found at https://github.com/Mathieu Tuli/LTL-GATA. The only new asset in this work is our code, which we provide here: https://github.com/Mathieu Tuli/LTL-GATA. |
| Open Datasets | Yes | We focus on the Text World Cooking domain, popularized by Adhikari et al. (2020) and Microsoft s First Text World Problems: A Language and Reinforcement Learning Challenge (FTWP) (Trischler et al., 2019). To have as fair a comparison with Adhikari et al. (2020) as possible, we reused the sets of games they had generated. |
| Dataset Splits | Yes | For the training games, they had created two sets: one set that contains 20 unique games per level and another that contains 100 unique games per level. Both the validation and testing sets have 20 unique games each per level. |
| Hardware Specification | Yes | All experiments were run on a single NVIDIA A40 GPU with 48GB of memory and an Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz. The average training time per model was about 2-3 days. |
| Software Dependencies | No | The paper mentions several software components and frameworks used (e.g., Text World, GATA, Spot engine, Transformer architecture, Double DQN, Adam, RAdam, GPT-3) but does not provide specific version numbers for these or other ancillary software components used in the experiments. |
| Experiment Setup | Yes | We replicate all but three hyper-parameters from Adhikari et al. (2020): (1) we use a batch size of 200 instead of 64 when training on the 100 game set, (2) for level 3, we use Boltzmann action selection, and (3) we use Adam Kingma & Ba (2015) with a learning rate of 0.0003 instead of RAdam Liu et al. (2020) with a learning rate of 0.001. These changes boosted performance for all models. See Appendix H.1 for more details. In Table 2, we list the hyperparameters used for training our models. |