Hindsight Experience Replay
Authors: Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, Wojciech Zaremba
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments. |
| Researcher Affiliation | Collaboration | Marcin Andrychowicz , Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob Mc Grew, Josh Tobin, Pieter Abbeel , Wojciech Zaremba. marcin@openai.com. We would also like to thank Rein Houthooft and the whole Open AI team for fruitful discussions. |
| Pseudocode | Yes | Alg. 1 for a more formal description of the algorithm. Algorithm 1 Hindsight Experience Replay (HER) |
| Open Source Code | No | The paper provides a link to a video demonstrating the experiments ('The video presenting our experiments is available at https://goo.gl/SMr Qn I.') but does not provide a link to the source code for the methodology described. |
| Open Datasets | No | The paper states, 'The are no standard environments for multi-goal RL and therefore we created our own environments.' While it describes the environment setup and task details, it does not provide access information (link, DOI, repository, or formal citation) for a public or open dataset used for training. |
| Dataset Splits | Yes | The results are averaged across 5 random seeds and shaded areas represent one standard deviation. An episode is considered successful if the distance between the object and the goal at the end of the episode is less than 7cm for pushing and pick-and-place and less than 20cm for sliding. |
| Hardware Specification | No | The paper mentions using 'a 7-DOF Fetch Robotics arm' and 'Mu Jo Co (Todorov et al., 2012) physics engine' for simulation, and deployment on a 'physical robot'. However, it does not provide specific details about the computing hardware (e.g., GPU/CPU models, RAM) used to train the models in the simulation. |
| Software Dependencies | No | The robot is simulated using the Mu Jo Co (Todorov et al., 2012) physics engine. Training is performed using the DDPG algorithm (Lillicrap et al., 2015) with Adam (Kingma and Ba, 2014) as the optimizer. The paper lists software components but does not provide specific version numbers for them. |
| Experiment Setup | Yes | Policies are represented as Multi-Layer Perceptrons (MLPs) with Rectified Linear Unit (Re LU) activation functions. Training is performed using the DDPG algorithm (Lillicrap et al., 2015) with Adam (Kingma and Ba, 2014) as the optimizer. See Appendix A for more details and the values of all hyperparameters. For all tasks the initial position of the gripper is fixed, while the initial position of the object and the target are randomized. See Appendix A for details. |