Exploration via Hindsight Goal Generation
Authors: Zhizhou Ren, Kefan Dong, Yuan Zhou, Qiang Liu, Jian Peng
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We have extensively evaluated our goal generation algorithm on a number of robotic manipulation tasks and demonstrated substantially improvement over the original HER in terms of sample efficiency. |
| Researcher Affiliation | Academia | Zhizhou Ren , Kefan Dong Institute for Interdisciplinary Information Sciences, Tsinghua University Department of Computer Science, University of Illinois at Urbana-Champaign {rzz16, dkf16}@mails.tsinghua.edu.cn Yuan Zhou Department of Industrial and Enterprise Systems Engineering University of Illinois at Urbana-Champaign yuanz@illinois.edu Qiang Liu Department of Computer Science University of Texas at Austin lqiang@cs.utexas.edu Jian Peng Department of Computer Science University of Illinois at Urbana-Champaign jianpeng@illinois.edu |
| Pseudocode | Yes | Algorithm 1 Exploration via Hindsight Goal Generation (HGG) |
| Open Source Code | Yes | Our code is available at https://github.com/Stilwell-Git/Hindsight-Goal-Generation. |
| Open Datasets | Yes | Our experiment environments are based on the standard robotic manipulation environments in the Open AI Gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper mentions training and testing but does not specify explicit validation dataset splits or how validation was handled. |
| Hardware Specification | Yes | All experiments are done on a server with 1 NVIDIA V100 GPU and 48 Intel Xeon E5-2698 v4 CPUs. |
| Software Dependencies | No | The paper mentions using the DDPG algorithm and Adam optimizer, and states that hyperparameters are kept the same as in Open AI Baselines, but it does not provide specific version numbers for any software components (e.g., Python, PyTorch, TensorFlow, Gym, etc.). |
| Experiment Setup | Yes | We use the DDPG algorithm with a batch size of 256. For all environments, we use learning rates 1e-3 for both actor and critic networks. We also use Adam optimizer. All experiments are run for 5000 epochs, with 50 rollout steps and 50 training steps per epoch. We use a discount factor of 0.98 for Fetch and 0.95 for Hand environments. The Lipschitz constant L for HGG is set to 5.0 and the distance weight c is set to 3.0. |