reproducibilityindex.ai

Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games

Authors: Yunqiu Xu, Meng Fang, Ling Chen, Yali Du, Joey Tianyi Zhou, Chengqi Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents.
Researcher Affiliation	Collaboration	Yunqiu Xu University of Technology Sydney Yunqiu.Xu@student.uts.edu.au Meng Fang Tencent Robotics X mfang@tencent.com Ling Chen University of Technology Sydney Ling.Chen@uts.edu.au Yali Du University College London yali.du@ucl.ac.uk Joey Tianyi Zhou IHPC A*STAR zhouty@ihpc.a-star.edu.sg Chengqi Zhang University of Technology Sydney Chengqi.Zhang@uts.edu.au
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It describes the methods in narrative text and through diagrams.
Open Source Code	Yes	Our code is available at https://github.com/Yunqiu Xu/SHA-KG.
Open Datasets	Yes	We evaluate our method on a set of man-made games in Jericho game suite [20].
Dataset Splits	No	The paper describes training and evaluation on game environments but does not specify distinct training, validation, and test dataset splits in the conventional sense for supervised learning tasks. It refers to training interaction steps and reporting scores over finished episodes, rather than explicit data splits for validation.
Hardware Specification	No	The paper mentions 'to reduce GPU cost' but does not provide specific hardware details such as GPU models, CPU specifications, or memory.
Software Dependencies	No	The paper mentions software components like 'GATs', 'GRUs', 'Open Information Extraction (Open IE)', and 'Adam optimizer', but does not provide specific version numbers for any of them.
Experiment Setup	Yes	Training implementation. We follow the hyper-parameter setting of KG-A2C [3] except that we reduce the node embedding dimension in GATs from 50 to 25 to reduce GPU cost. We set dhigh as 100, and dlow as 50. ... An episode will be terminated after 100 valid steps or game over / victory. For each game, an individual agent is trained for 10^6 interaction steps. The training data is collected from 32 environments in parallel. An optimization step is performed per 8 interaction steps via the Adam optimizer with the learning rate 0.003. ... All the quantitative results are averaged over ﬁve independent runs.