Learning Dynamic Belief Graphs to Generalize on Text-Based Games
Authors: Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, Will Hamilton
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on 500+ unique games from the Text World suite show that our best agent outperforms text-based baselines by an average of 24.2%. |
| Researcher Affiliation | Collaboration | University of Waterloo Microsoft Research, Montréal Charles University Mila Mc Gill University HEC Montréal Vector Institute eric.yuan@microsoft.com |
| Pseudocode | No | The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. It describes methods in text and uses figures to illustrate components. |
| Open Source Code | Yes | Code and dataset used: https://github.com/xingdi-eric-yuan/GATA-public |
| Open Datasets | Yes | We benchmark GATA on 500+ unique games generated by Text World [9], evaluating performance in a setting that requires generalization across different game configurations. ... Code and dataset used: https://github.com/xingdi-eric-yuan/GATA-public |
| Dataset Splits | Yes | We divide generated games, all of which have unique recipes and map configurations, into sets for training, validation, and test. ... We divide the games into four subsets with one difficulty level per subset. Each subset contains 100 training, 20 validation, and 20 test games, which are sampled from a distribution determined by their difficulty level. |
| Hardware Specification | No | The paper does not specify the hardware used for experiments (e.g., specific GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions software components like transformer-based models, R-GCNs, and PyTorch (in acknowledgements) but does not provide specific version numbers for these or other dependencies required for replication. |
| Experiment Setup | No | The paper does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It mentions using Double DQN combined with multi-step learning and prioritized experience replay, and sampling new games for episodes, but not the specific values for these. |