Locally Persistent Exploration in Continuous Control Tasks with Sparse Rewards

Authors: Susan Amin, Maziar Gomrokchi, Hossein Aboutalebi, Harsh Satija, Doina Precup

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide empirical evaluations of our approach in a simulated 2D navigation task, as well as higherdimensional Mu Jo Co continuous control locomotion tasks with sparse rewards.
Researcher Affiliation Academia 1Department of Computer Science, Mc Gill University, Montr eal, Qu ebec, Canada 2Mila Qu ebec Artificial Intelligence Institute, Montr eal, Qu ebec, Canada 3Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada 4Waterloo Artificial Intelligence Institute, University of Waterloo, Waterloo, Ontario, Canada.
Pseudocode Yes Algorithm 1 presents the Poly RL pseudo code. The method of action sampling is provided in Algorithm 2 in the Supplementary Information (Section 3).
Open Source Code No The paper does not provide any explicit statement or link to open-source code for the described methodology.
Open Datasets Yes The sets of experiments involving the learning methods DDPG and SAC are performed in Mu Jo Co high-dimensional continuous control tasks Sparse Hopper-V2 (A R3, S R11), Sparse Half Cheetah-V2 (A R6, S R17), and Sparse Ant-V2 (A R8, S R111)
Dataset Splits No The paper describes the tasks and settings but does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages or sample counts) for any of the environments used.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models) used to run the experiments.
Software Dependencies No The paper mentions algorithms like Q-learning, SAC, and DDPG, and the use of "Gym environments" but does not specify version numbers for any software, libraries, or frameworks used.
Experiment Setup Yes The environment in our 2D sparse-reward navigation tasks either consists of only one 400x400 chamber (goal reward +1000), or a 50x50 room encapsulated by a 100x100 chamber. ... We integrate the Poly RL exploration algorithm with the Q-learning method with linear function approximation (learning rate = 0.01)... In the sparse Mu Jo Co tasks, the agent receives a reward of +1 only when it crosses a target distance λ, termed the sparsity threshold.