reproducibilityindex.ai

Data-efficient Hindsight Off-policy Option Learning

Authors: Markus Wulfmeier, Dushyant Rao, Roland Hafner, Thomas Lampe, Abbas Abdolmaleki, Tim Hertweck, Michael Neunert, Dhruva Tirumala, Noah Siegel, Nicolas Heess, Martin Riedmiller

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The approach outperforms existing option learning methods on common benchmarks. To better understand the option framework and disentangle beneﬁts from both temporal and action abstraction, we evaluate ablations with ﬂat policies and mixture policies with comparable optimization. The results highlight the importance of both types of abstraction as well as off-policy training and trust-region constraints, particularly in challenging, simulated 3D robot manipulation tasks from raw pixel inputs.
Researcher Affiliation	Industry	1Deep Mind, London, United Kingdom. Correspondence to: Markus Wulfmeier <mwulfmeier@google.com>.
Pseudocode	Yes	Algorithm 1 Hindsight Off-policy Options
Open Source Code	No	The paper does not provide a link to source code or an explicit statement about its public availability.
Open Datasets	Yes	We use a set of common Open AI gym (Brockman et al., 2016) benchmarks to answer these questions.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages or sample counts) needed to reproduce the experiment.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments.
Software Dependencies	No	The paper mentions software like TensorFlow and OpenAI Gym, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	No	The paper mentions aspects of the experimental setup such as input resolution (e.g., 64x64 pixel) and data augmentation strategies (e.g., multi-task learning, hindsight experience replay), but it does not provide specific details like hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings).