reproducibilityindex.ai

Learning Dynamic Belief Graphs to Generalize on Text-Based Games

Authors: Ashutosh Adhikari, Xingdi Yuan, Marc-Alexandre Côté, Mikuláš Zelinka, Marc-Antoine Rondeau, Romain Laroche, Pascal Poupart, Jian Tang, Adam Trischler, Will Hamilton

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 500+ unique games from the Text World suite show that our best agent outperforms text-based baselines by an average of 24.2%.
Researcher Affiliation	Collaboration	University of Waterloo Microsoft Research, Montréal Charles University Mila Mc Gill University HEC Montréal Vector Institute eric.yuan@microsoft.com
Pseudocode	No	The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block. It describes methods in text and uses figures to illustrate components.
Open Source Code	Yes	Code and dataset used: https://github.com/xingdi-eric-yuan/GATA-public
Open Datasets	Yes	We benchmark GATA on 500+ unique games generated by Text World [9], evaluating performance in a setting that requires generalization across different game conﬁgurations. ... Code and dataset used: https://github.com/xingdi-eric-yuan/GATA-public
Dataset Splits	Yes	We divide generated games, all of which have unique recipes and map conﬁgurations, into sets for training, validation, and test. ... We divide the games into four subsets with one difﬁculty level per subset. Each subset contains 100 training, 20 validation, and 20 test games, which are sampled from a distribution determined by their difﬁculty level.
Hardware Specification	No	The paper does not specify the hardware used for experiments (e.g., specific GPU/CPU models, memory).
Software Dependencies	No	The paper mentions software components like transformer-based models, R-GCNs, and PyTorch (in acknowledgements) but does not provide specific version numbers for these or other dependencies required for replication.
Experiment Setup	No	The paper does not explicitly provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed system-level training settings in the main text. It mentions using Double DQN combined with multi-step learning and prioritized experience replay, and sampling new games for episodes, but not the specific values for these.