Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Inherently Explainable Reinforcement Learning in Natural Language
Authors: Xiangyu Peng, Mark Riedl, Prithviraj Ammanabrolu
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that this agent provides significantly improved explanations over strong baselines, as rated by human participants generally unfamiliar with the environment, while also matching state-of-the-art task performance. |
| Researcher Affiliation | Collaboration | Xiangyu Peng Chen Xing Prafulla Kumar Choubey Chien-Sheng Wu, Caiming Xiong Georgia Institute of Technology Allen Institute for AI EMAIL, EMAIL |
| Pseudocode | No | The paper describes the architecture and processes in prose and diagrams (Figure 2) but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code: https://github.com/xiangyu-peng/HEX-RL |
| Open Datasets | Yes | A specially constructed dataset for question answering in text games Jericho QA is used to fine-tune ALBERT (Lan et al., 2019) to answer these questions (See Appendix A.3)." and "We compare HEX-RL with four strong state-of-art reinforcement learning agents focusing on contemporary agents that use knowledge graphs on an established test set of 9 games from the Jericho benchmark (Hausknecht et al., 2020). |
| Dataset Splits | No | The paper mentions using 'a specially constructed dataset for question answering in text games Jericho QA... to fine-tune ALBERT' and an 'established test set of 9 games from the Jericho benchmark', but does not provide specific training/validation/test split percentages or sample counts for these datasets in the provided text. |
| Hardware Specification | No | The paper states that information regarding 'the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)' is included, but these specific hardware details are not present in the provided text. |
| Software Dependencies | No | The paper mentions models like ALBERT and GPT-2, and frameworks like Jericho, but does not provide specific version numbers for software dependencies such as programming languages, libraries, or deep learning frameworks used for implementation. |
| Experiment Setup | No | The paper describes training HEX-RL on two reward types ('game only' and 'game with intrinsic motivation') and refers to Appendix A.2 for details on A2C training, but specific numerical hyperparameter values (e.g., learning rate, batch size) are not provided in the main text. |