Learning by Playing Solving Sparse Reward Tasks from Scratch
Authors: Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Wiele, Vlad Mnih, Nicolas Heess, Jost Tobias Springenberg
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach. A video of the rich set of learned behaviours can be found at https://youtu.be/m PKyvoc Ne M. |
| Researcher Affiliation | Industry | 1Google Deep Mind, London, GB. Correspondence to: Martin Riedmiller <riedmiller@google.com>. |
| Pseudocode | No | No pseudocode or algorithm blocks were explicitly labeled or formatted as such within the paper. |
| Open Source Code | No | The paper does not provide a link to open-source code for the described methodology. It only provides a link to a video demonstrating learned behaviors. |
| Open Datasets | No | The paper describes experiments conducted in simulation and on a real robot where data is generated. It does not use or provide access information for a publicly available or open dataset. It states, 'all simulation results are obtained in an off-policy learning setup where data is gathered by multiple agents (36 actors)'. |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits in terms of percentages or counts, as it generates its own data through simulation and real-robot interaction. |
| Hardware Specification | No | The paper does not specify particular hardware components (e.g., specific GPU or CPU models) used for training or running simulations, only mentions 'Kinova Jaco robot arm in simulation and on hardware' and 'a single robot arm'. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers. |
| Experiment Setup | Yes | Episodes lasted for 360 steps in total with scheduler choices every ΞΎ = 180 steps, resulting in two choices per episode. ... (36 actors) which send collected experience to a pool of 36 learners. |