Inherently Explainable Reinforcement Learning in Natural Language

Authors: Xiangyu Peng, Mark Riedl, Prithviraj Ammanabrolu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that this agent provides significantly improved explanations over strong baselines, as rated by human participants generally unfamiliar with the environment, while also matching state-of-the-art task performance.
Researcher Affiliation Collaboration Xiangyu Peng Chen Xing Prafulla Kumar Choubey Chien-Sheng Wu, Caiming Xiong Georgia Institute of Technology Allen Institute for AI {xpeng62,riedl}@gatech.edu, raja@allenai.org
Pseudocode No The paper describes the architecture and processes in prose and diagrams (Figure 2) but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes 1Code: https://github.com/xiangyu-peng/HEX-RL
Open Datasets Yes A specially constructed dataset for question answering in text games Jericho QA is used to fine-tune ALBERT (Lan et al., 2019) to answer these questions (See Appendix A.3)." and "We compare HEX-RL with four strong state-of-art reinforcement learning agents focusing on contemporary agents that use knowledge graphs on an established test set of 9 games from the Jericho benchmark (Hausknecht et al., 2020).
Dataset Splits No The paper mentions using 'a specially constructed dataset for question answering in text games Jericho QA... to fine-tune ALBERT' and an 'established test set of 9 games from the Jericho benchmark', but does not provide specific training/validation/test split percentages or sample counts for these datasets in the provided text.
Hardware Specification No The paper states that information regarding 'the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)' is included, but these specific hardware details are not present in the provided text.
Software Dependencies No The paper mentions models like ALBERT and GPT-2, and frameworks like Jericho, but does not provide specific version numbers for software dependencies such as programming languages, libraries, or deep learning frameworks used for implementation.
Experiment Setup No The paper describes training HEX-RL on two reward types ('game only' and 'game with intrinsic motivation') and refers to Appendix A.2 for details on A2C training, but specific numerical hyperparameter values (e.g., learning rate, batch size) are not provided in the main text.