reproducibilityindex.ai

Curriculum-guided Hindsight Experience Replay

Authors: Meng Fang, Tianyi Zhou, Yali Du, Lei Han, Zhengyou Zhang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CHER and compare to state-of-the-art baselines on several challenging robotic manipulation tasks in simulated Mu Jo Co environments [Todorov et al., 2012]. In particular, we will use a simple Fetch environment as a toy example and Shadow Dexterous Hand environments from Open AI Gym [Brockman et al., 2016].
Researcher Affiliation	Collaboration	1Tencent Robotics X 2Paul G. Allen School of Computer Science & Engineering, University of Washington 3University College London
Pseudocode	Yes	Algorithm 1 STOCHASTIC-GREEDY(k, m, λ) and Algorithm 2 Curriculum-guided HER (CHER) are provided in the paper.
Open Source Code	Yes	Our code is available at https://github.com/mengf1/CHER.
Open Datasets	Yes	We evaluate CHER and compare to state-of-the-art baselines on several challenging robotic manipulation tasks in simulated Mu Jo Co environments [Todorov et al., 2012]. In particular, we will use a simple Fetch environment as a toy example and Shadow Dexterous Hand environments from Open AI Gym [Brockman et al., 2016].
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits with specific percentages or counts. It mentions training for 50 epochs and evaluating policies after each epoch with test rollouts, but no distinct validation split is specified.
Hardware Specification	No	The paper mentions: 'For all environments except Fetch Reach, we train policies on a single machine with 20 CPU cores. Each core generates experiences by using two parallel rollouts with MPI for synchronization.' It does not specify the CPU model or any GPU information.
Software Dependencies	No	The paper mentions using 'DDPG', 'HER', 'Open AI Gym', and 'Mu Jo Co environments' but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	For all environments except Fetch Reach, we train policies on a single machine with 20 CPU cores. Each core generates experiences by using two parallel rollouts with MPI for synchronization. We train each agent for 50 epochs with batch size 64. Hyperparameters are nearly the same as in Andrychowicz et al. [2017]. In CHER, we use \|B\| = 128, \|A\| = k = 64 and \|b\| = m = 3 for Algorithm 1. For the tasks in this paper, we use an exponentially increasing λ over the course of training, i.e., λ = (1 + η)γλ0, (7) where η [0, 1] is a learning pace controlling the progress of the curriculum, γ is the episode of the off-policy RL, and λ0 is the initial weight for proximity, which should be relatively small.