Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games

Authors: Yunqiu Xu, Meng Fang, Ling Chen, Yali Du, Joey Tianyi Zhou, Chengqi Zhang

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents.
Researcher Affiliation Collaboration Yunqiu Xu University of Technology Sydney EMAIL Meng Fang Tencent Robotics X EMAIL Ling Chen University of Technology Sydney EMAIL Yali Du University College London EMAIL Joey Tianyi Zhou IHPC A*STAR EMAIL Chengqi Zhang University of Technology Sydney EMAIL
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the methods in narrative text and through diagrams.
Open Source Code Yes Our code is available at https://github.com/Yunqiu Xu/SHA-KG.
Open Datasets Yes We evaluate our method on a set of man-made games in Jericho game suite [20].
Dataset Splits No The paper describes training and evaluation on game environments but does not specify distinct training, validation, and test dataset splits in the conventional sense for supervised learning tasks. It refers to training interaction steps and reporting scores over finished episodes, rather than explicit data splits for validation.
Hardware Specification No The paper mentions 'to reduce GPU cost' but does not provide specific hardware details such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions software components like 'GATs', 'GRUs', 'Open Information Extraction (Open IE)', and 'Adam optimizer', but does not provide specific version numbers for any of them.
Experiment Setup Yes Training implementation. We follow the hyper-parameter setting of KG-A2C [3] except that we reduce the node embedding dimension in GATs from 50 to 25 to reduce GPU cost. We set dhigh as 100, and dlow as 50. ... An episode will be terminated after 100 valid steps or game over / victory. For each game, an individual agent is trained for 10^6 interaction steps. The training data is collected from 32 environments in parallel. An optimization step is performed per 8 interaction steps via the Adam optimizer with the learning rate 0.003. ... All the quantitative results are averaged over five independent runs.