reproducibilityindex.ai

Exploration via Hindsight Goal Generation

Authors: Zhizhou Ren, Kefan Dong, Yuan Zhou, Qiang Liu, Jian Peng

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We have extensively evaluated our goal generation algorithm on a number of robotic manipulation tasks and demonstrated substantially improvement over the original HER in terms of sample efﬁciency.
Researcher Affiliation	Academia	Zhizhou Ren , Kefan Dong Institute for Interdisciplinary Information Sciences, Tsinghua University Department of Computer Science, University of Illinois at Urbana-Champaign {rzz16, dkf16}@mails.tsinghua.edu.cn Yuan Zhou Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign yuanz@illinois.edu Qiang Liu Department of Computer Science University of Texas at Austin lqiang@cs.utexas.edu Jian Peng Department of Computer Science University of Illinois at Urbana-Champaign jianpeng@illinois.edu
Pseudocode	Yes	Algorithm 1 Exploration via Hindsight Goal Generation (HGG)
Open Source Code	Yes	Our code is available at https://github.com/Stilwell-Git/Hindsight-Goal-Generation.
Open Datasets	Yes	Our experiment environments are based on the standard robotic manipulation environments in the Open AI Gym (Brockman et al., 2016).
Dataset Splits	No	The paper mentions training and testing but does not specify explicit validation dataset splits or how validation was handled.
Hardware Specification	Yes	All experiments are done on a server with 1 NVIDIA V100 GPU and 48 Intel Xeon E5-2698 v4 CPUs.
Software Dependencies	No	The paper mentions using the DDPG algorithm and Adam optimizer, and states that hyperparameters are kept the same as in Open AI Baselines, but it does not provide specific version numbers for any software components (e.g., Python, PyTorch, TensorFlow, Gym, etc.).
Experiment Setup	Yes	We use the DDPG algorithm with a batch size of 256. For all environments, we use learning rates 1e-3 for both actor and critic networks. We also use Adam optimizer. All experiments are run for 5000 epochs, with 50 rollout steps and 50 training steps per epoch. We use a discount factor of 0.98 for Fetch and 0.95 for Hand environments. The Lipschitz constant L for HGG is set to 5.0 and the distance weight c is set to 3.0.