Data-efficient Hindsight Off-policy Option Learning
Authors: Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin Riedmiller
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The approach outperforms existing option learning methods on common benchmarks. To better understand the option framework and disentangle benefits from both temporal and action abstraction, we evaluate ablations with flat policies and mixture policies with comparable optimization. The results highlight the importance of both types of abstraction as well as off-policy training and trust-region constraints, particularly in challenging, simulated 3D robot manipulation tasks from raw pixel inputs. |
| Researcher Affiliation | Industry | 1Deep Mind, London, United Kingdom. Correspondence to: Markus Wulfmeier <mwulfmeier@google.com>. |
| Pseudocode | Yes | Algorithm 1 Hindsight Off-policy Options |
| Open Source Code | No | The paper does not provide a link to source code or an explicit statement about its public availability. |
| Open Datasets | Yes | We use a set of common Open AI gym (Brockman et al., 2016) benchmarks to answer these questions. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages or sample counts) needed to reproduce the experiment. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like TensorFlow and OpenAI Gym, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | The paper mentions aspects of the experimental setup such as input resolution (e.g., 64x64 pixel) and data augmentation strategies (e.g., multi-task learning, hindsight experience replay), but it does not provide specific details like hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings). |