reproducibilityindex.ai

Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

Authors: Sungryull Sohn, Sungtae Lee, Jongwook Choi, Harm H Van Seijen, Mehdi Fatemi, Honglak Lee

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiment in a tabular RL setting demonstrates that the SP constraint can signiﬁcantly reduce the trajectory space of policy. As a result, our constraint enables more sample efﬁcient learning by suppressing redundant exploration and exploitation. Our experiments on Mini Grid, Deep Mind Lab, Atari, and Fetch show that the proposed method signiﬁcantly improves proximal policy optimization (PPO) and outperforms existing novelty-seeking exploration methods including count-based exploration even in continuous control tasks, indicating that it improves the sample efﬁciency by preventing the agent from taking redundant actions.
Researcher Affiliation	Collaboration	1University of Michigan 2LG AI Research 3Yonsei University 4Microsoft Research.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper states 'We used the publicly available codebase (Savinov et al., 2018b) to obtain the baseline results.' but does not provide a link or statement about their own code for the proposed method.
Open Datasets	Yes	We evaluate our SPRL on four challenging domains: Mini Grid (Chevalier-Boisvert et al., 2018), Deep Mind Lab (Beattie et al., 2016), Atari (Bellemare et al., 2013), and Fetch (Plappert et al., 2018).
Dataset Splits	No	The paper does not specify training/test/validation dataset splits or percentages in the way supervised learning typically would, as it deals with reinforcement learning environments.
Hardware Specification	No	The paper mentions '32 parallel environments' and '12 parallel environments' in the context of comparisons with other methods (RND, SIL), implying distributed training, but does not provide specific hardware details such as GPU or CPU models used for their experiments.
Software Dependencies	No	The paper mentions software like 'PPO', 'episodic curiosity (ECO)', 'intrinsic curiosity module (ICM)', and 'GT-Grid', and states 'We used the publicly available codebase (Savinov et al., 2018b) to obtain the baseline results.' However, it does not specify version numbers for any of these software dependencies.
Experiment Setup	Yes	We used the same hyperparameter for all the tasks for a given domain the details are described in the Appendix. We used the standard domain and tasks for reproducibility. ... See Appendix D, Appendix E, Appendix F, and Appendix G for more details of Mini Grid, Deep Mind Lab, Atari, and Fetch respectively.