reproducibilityindex.ai

Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills

Authors: Yevgen Chebotar, Karol Hausman, Yao Lu, Ted Xiao, Dmitry Kalashnikov, Jacob Varley, Alex Irpan, Benjamin Eysenbach, Ryan C Julian, Chelsea Finn, Sergey Levine

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we aim to answer the following questions: 1) How does our method compare to previous methods for learning goal-conditioned policies from ofﬂine data, such as goal-conditioned behavioral cloning and standard Q-learning with hindsight relabeling? 2) Can our method learn diverse skills on real robots with high-dimensional camera images? 3) Does our proposed goal chaining technique facilitate learning to reach long-horizon goals? 4) Is goal reaching an effective pre-training step or a suitable auxiliary objective for accelerating learning of downstream reward-driven skills?
Researcher Affiliation	Collaboration	1Robotics at Google 2Carnegie Mellon University 3University of Southern California 4Stanford University 5UC Berkeley. Correspondence to: Yevgen Chebotar <chebotar@google.com>.
Pseudocode	Yes	Algorithm 1 Goal reaching with Actionable Models
Open Source Code	No	The paper does not provide a direct link to a source code repository or an explicit statement about releasing the code for the described methodology.
Open Datasets	No	The paper states: 'Our real robot training data comes from a wide range of reinforcement learning experiments conducted as part of another research project using the same platform (Kalashnikov et al., 2021).' and 'In our simulated experiments, we generate a dataset of trajectories from RL runs for the four tasks depicted in Fig. 6.' However, it does not provide concrete access information (link, DOI, repository) for these datasets.
Dataset Splits	No	The paper mentions evaluating models and training data, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	Yes	The data contains over 800,000 episodes, which altogether amounts to over 6 months of continuous robotic interaction on 7 KUKA robots.
Software Dependencies	No	The paper mentions using the 'QT-Opt framework' (Kalashnikov et al., 2018) but does not provide specific version numbers for this framework or other ancillary software components like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup	Yes	The initial variances of the (x, y, z) end-effector positions and the azimuth angle during the CEM-optimization (Rubinstein & Kroese, 2004) are set to (3cm, 3cm, 6cm, 0.16 rad), respectively. We run CEM for 2 iterations with 64 samples per iteration and 10% elite percentile.