Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games

Authors: Yunqiu Xu, Meng Fang, Ling Chen, Yali Du, Joey Tianyi Zhou, Chengqi Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents.
Researcher Affiliation Collaboration Yunqiu Xu University of Technology Sydney Yunqiu.Xu@student.uts.edu.au Meng Fang Tencent Robotics X mfang@tencent.com Ling Chen University of Technology Sydney Ling.Chen@uts.edu.au Yali Du University College London yali.du@ucl.ac.uk Joey Tianyi Zhou IHPC A*STAR zhouty@ihpc.a-star.edu.sg Chengqi Zhang University of Technology Sydney Chengqi.Zhang@uts.edu.au
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the methods in narrative text and through diagrams.
Open Source Code Yes Our code is available at https://github.com/Yunqiu Xu/SHA-KG.
Open Datasets Yes We evaluate our method on a set of man-made games in Jericho game suite [20].
Dataset Splits No The paper describes training and evaluation on game environments but does not specify distinct training, validation, and test dataset splits in the conventional sense for supervised learning tasks. It refers to training interaction steps and reporting scores over finished episodes, rather than explicit data splits for validation.
Hardware Specification No The paper mentions 'to reduce GPU cost' but does not provide specific hardware details such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions software components like 'GATs', 'GRUs', 'Open Information Extraction (Open IE)', and 'Adam optimizer', but does not provide specific version numbers for any of them.
Experiment Setup Yes Training implementation. We follow the hyper-parameter setting of KG-A2C [3] except that we reduce the node embedding dimension in GATs from 50 to 25 to reduce GPU cost. We set dhigh as 100, and dlow as 50. ... An episode will be terminated after 100 valid steps or game over / victory. For each game, an individual agent is trained for 10^6 interaction steps. The training data is collected from 32 environments in parallel. An optimization step is performed per 8 interaction steps via the Adam optimizer with the learning rate 0.003. ... All the quantitative results are averaged over five independent runs.