Learning Dynamic Belief Graphs to Generalize on Text-Based Games

Authors: Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, Will Hamilton

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on 500+ unique games from the Text World suite show that our best agent outperforms text-based baselines by an average of 24.2%.
Researcher Affiliation Collaboration University of Waterloo Microsoft Research, Montréal Charles University Mila Mc Gill University HEC Montréal Vector Institute eric.yuan@microsoft.com
Pseudocode No The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. It describes methods in text and uses figures to illustrate components.
Open Source Code Yes Code and dataset used: https://github.com/xingdi-eric-yuan/GATA-public
Open Datasets Yes We benchmark GATA on 500+ unique games generated by Text World [9], evaluating performance in a setting that requires generalization across different game configurations. ... Code and dataset used: https://github.com/xingdi-eric-yuan/GATA-public
Dataset Splits Yes We divide generated games, all of which have unique recipes and map configurations, into sets for training, validation, and test. ... We divide the games into four subsets with one difficulty level per subset. Each subset contains 100 training, 20 validation, and 20 test games, which are sampled from a distribution determined by their difficulty level.
Hardware Specification No The paper does not specify the hardware used for experiments (e.g., specific GPU/CPU models, memory).
Software Dependencies No The paper mentions software components like transformer-based models, R-GCNs, and PyTorch (in acknowledgements) but does not provide specific version numbers for these or other dependencies required for replication.
Experiment Setup No The paper does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It mentions using Double DQN combined with multi-step learning and prioritized experience replay, and sampling new games for episodes, but not the specific values for these.