reproducibilityindex.ai

Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

Authors: Susan Amin, Maziar Gomrokchi, Hossein Aboutalebi, Harsh Satija, Doina Precup

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical evaluations of our approach in a simulated 2D navigation task, as well as higherdimensional Mu Jo Co continuous control locomotion tasks with sparse rewards.
Researcher Affiliation	Academia	1Department of Computer Science, Mc Gill University, Montr eal, Qu ebec, Canada 2Mila Qu ebec Artiﬁcial Intelligence Institute, Montr eal, Qu ebec, Canada 3Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada 4Waterloo Artiﬁcial Intelligence Institute, University of Waterloo, Waterloo, Ontario, Canada.
Pseudocode	Yes	Algorithm 1 presents the Poly RL pseudo code. The method of action sampling is provided in Algorithm 2 in the Supplementary Information (Section 3).
Open Source Code	No	The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets	Yes	The sets of experiments involving the learning methods DDPG and SAC are performed in Mu Jo Co high-dimensional continuous control tasks Sparse Hopper-V2 (A R3, S R11), Sparse Half Cheetah-V2 (A R6, S R17), and Sparse Ant-V2 (A R8, S R111)
Dataset Splits	No	The paper describes the tasks and settings but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages or sample counts) for any of the environments used.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models) used to run the experiments.
Software Dependencies	No	The paper mentions algorithms like Q-learning, SAC, and DDPG, and the use of "Gym environments" but does not specify version numbers for any software, libraries, or frameworks used.
Experiment Setup	Yes	The environment in our 2D sparse-reward navigation tasks either consists of only one 400x400 chamber (goal reward +1000), or a 50x50 room encapsulated by a 100x100 chamber. ... We integrate the Poly RL exploration algorithm with the Q-learning method with linear function approximation (learning rate = 0.01)... In the sparse Mu Jo Co tasks, the agent receives a reward of +1 only when it crosses a target distance λ, termed the sparsity threshold.