reproducibilityindex.ai

Inherently Explainable Reinforcement Learning in Natural Language

Authors: Xiangyu Peng, Mark Riedl, Prithviraj Ammanabrolu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that this agent provides signiﬁcantly improved explanations over strong baselines, as rated by human participants generally unfamiliar with the environment, while also matching state-of-the-art task performance.
Researcher Affiliation	Collaboration	Xiangyu Peng Chen Xing Prafulla Kumar Choubey Chien-Sheng Wu, Caiming Xiong Georgia Institute of Technology Allen Institute for AI {xpeng62,riedl}@gatech.edu, raja@allenai.org
Pseudocode	No	The paper describes the architecture and processes in prose and diagrams (Figure 2) but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	1Code: https://github.com/xiangyu-peng/HEX-RL
Open Datasets	Yes	A specially constructed dataset for question answering in text games Jericho QA is used to ﬁne-tune ALBERT (Lan et al., 2019) to answer these questions (See Appendix A.3)." and "We compare HEX-RL with four strong state-of-art reinforcement learning agents focusing on contemporary agents that use knowledge graphs on an established test set of 9 games from the Jericho benchmark (Hausknecht et al., 2020).
Dataset Splits	No	The paper mentions using 'a specially constructed dataset for question answering in text games Jericho QA... to ﬁne-tune ALBERT' and an 'established test set of 9 games from the Jericho benchmark', but does not provide specific training/validation/test split percentages or sample counts for these datasets in the provided text.
Hardware Specification	No	The paper states that information regarding 'the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)' is included, but these specific hardware details are not present in the provided text.
Software Dependencies	No	The paper mentions models like ALBERT and GPT-2, and frameworks like Jericho, but does not provide specific version numbers for software dependencies such as programming languages, libraries, or deep learning frameworks used for implementation.
Experiment Setup	No	The paper describes training HEX-RL on two reward types ('game only' and 'game with intrinsic motivation') and refers to Appendix A.2 for details on A2C training, but specific numerical hyperparameter values (e.g., learning rate, batch size) are not provided in the main text.