reproducibilityindex.ai

Explicable Reward Design for Reinforcement Learning Agents

Authors: Rati Devidze, Goran Radanovic, Parameswaran Kamalaruban, Adish Singla

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on two navigation tasks demonstrate the effectiveness of EXPRD in designing explicable reward functions.
Researcher Affiliation	Academia	Rati Devidze1 Goran Radanovic1 Parameswaran Kamalaruban2 Adish Singla1 1Max Planck Institute for Software Systems (MPI-SWS), Saarbrucken, Germany 2The Alan Turing Institute, London, UK
Pseudocode	Yes	Algorithm 1 Iterative Greedy Algorithm for EXPRD
Open Source Code	Yes	1Github repo: https://github.com/adishs/neurips2021_explicable-reward-design_code.
Open Datasets	No	The paper describes custom-built simulation environments (ROOMSNAVENV and LINEKEYNAVENV) rather than using pre-existing public datasets. It does not provide access information for these environments as datasets.
Dataset Splits	Yes	All the results are reported as average over 40 runs and convergence plots show mean with standard error bars.
Hardware Specification	No	The paper states that hardware details are provided in the Appendix of the supplementary material, which is not part of the provided text for analysis.
Software Dependencies	No	The paper states that software dependency details are provided in the Appendix of the supplementary material, which is not part of the provided text for analysis. It mentions using "standard Q-learning method" but no specific software versions.
Experiment Setup	Yes	We use standard Q-learning method for the agent with a learning rate 0.5 and exploration factor 0.1 [7]. During training, the agent receives rewards based on b R, however, is evaluated based on R. A training episode ends when the maximum steps (set to 50) is reached or an agent s action terminates the episode. All the results are reported as average over 40 runs and convergence plots show mean with standard error bars.