reproducibilityindex.ai

Hindsight Experience Replay

Authors: Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, Wojciech Zaremba

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our approach on the task of manipulating objects with a robotic arm. In particular, we run experiments on three different tasks: pushing, sliding, and pick-and-place, in each case using only binary rewards indicating whether or not the task is completed. Our ablation studies show that Hindsight Experience Replay is a crucial ingredient which makes training possible in these challenging environments.
Researcher Affiliation	Collaboration	Marcin Andrychowicz , Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob Mc Grew, Josh Tobin, Pieter Abbeel , Wojciech Zaremba. marcin@openai.com. We would also like to thank Rein Houthooft and the whole Open AI team for fruitful discussions.
Pseudocode	Yes	Alg. 1 for a more formal description of the algorithm. Algorithm 1 Hindsight Experience Replay (HER)
Open Source Code	No	The paper provides a link to a video demonstrating the experiments ('The video presenting our experiments is available at https://goo.gl/SMr Qn I.') but does not provide a link to the source code for the methodology described.
Open Datasets	No	The paper states, 'The are no standard environments for multi-goal RL and therefore we created our own environments.' While it describes the environment setup and task details, it does not provide access information (link, DOI, repository, or formal citation) for a public or open dataset used for training.
Dataset Splits	Yes	The results are averaged across 5 random seeds and shaded areas represent one standard deviation. An episode is considered successful if the distance between the object and the goal at the end of the episode is less than 7cm for pushing and pick-and-place and less than 20cm for sliding.
Hardware Specification	No	The paper mentions using 'a 7-DOF Fetch Robotics arm' and 'Mu Jo Co (Todorov et al., 2012) physics engine' for simulation, and deployment on a 'physical robot'. However, it does not provide specific details about the computing hardware (e.g., GPU/CPU models, RAM) used to train the models in the simulation.
Software Dependencies	No	The robot is simulated using the Mu Jo Co (Todorov et al., 2012) physics engine. Training is performed using the DDPG algorithm (Lillicrap et al., 2015) with Adam (Kingma and Ba, 2014) as the optimizer. The paper lists software components but does not provide specific version numbers for them.
Experiment Setup	Yes	Policies are represented as Multi-Layer Perceptrons (MLPs) with Rectiﬁed Linear Unit (Re LU) activation functions. Training is performed using the DDPG algorithm (Lillicrap et al., 2015) with Adam (Kingma and Ba, 2014) as the optimizer. See Appendix A for more details and the values of all hyperparameters. For all tasks the initial position of the gripper is ﬁxed, while the initial position of the object and the target are randomized. See Appendix A for details.