Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
Authors: Yunqiu Xu, Meng Fang, Ling Chen, Yali Du, Joey Tianyi Zhou, Chengqi Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our method on a number of man-made benchmark games, and the experimental results demonstrate that our method performs better than existing text-based agents. |
| Researcher Affiliation | Collaboration | Yunqiu Xu University of Technology Sydney Yunqiu.Xu@student.uts.edu.au Meng Fang Tencent Robotics X mfang@tencent.com Ling Chen University of Technology Sydney Ling.Chen@uts.edu.au Yali Du University College London yali.du@ucl.ac.uk Joey Tianyi Zhou IHPC A*STAR zhouty@ihpc.a-star.edu.sg Chengqi Zhang University of Technology Sydney Chengqi.Zhang@uts.edu.au |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It describes the methods in narrative text and through diagrams. |
| Open Source Code | Yes | Our code is available at https://github.com/Yunqiu Xu/SHA-KG. |
| Open Datasets | Yes | We evaluate our method on a set of man-made games in Jericho game suite [20]. |
| Dataset Splits | No | The paper describes training and evaluation on game environments but does not specify distinct training, validation, and test dataset splits in the conventional sense for supervised learning tasks. It refers to training interaction steps and reporting scores over finished episodes, rather than explicit data splits for validation. |
| Hardware Specification | No | The paper mentions 'to reduce GPU cost' but does not provide specific hardware details such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions software components like 'GATs', 'GRUs', 'Open Information Extraction (Open IE)', and 'Adam optimizer', but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Training implementation. We follow the hyper-parameter setting of KG-A2C [3] except that we reduce the node embedding dimension in GATs from 50 to 25 to reduce GPU cost. We set dhigh as 100, and dlow as 50. ... An episode will be terminated after 100 valid steps or game over / victory. For each game, an individual agent is trained for 10^6 interaction steps. The training data is collected from 32 environments in parallel. An optimization step is performed per 8 interaction steps via the Adam optimizer with the learning rate 0.003. ... All the quantitative results are averaged over five independent runs. |