Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills

Authors: Yevgen Chebotar, Karol Hausman, Yao Lu, Ted Xiao, Dmitry Kalashnikov, Jacob Varley, Alex Irpan, Benjamin Eysenbach, Ryan C Julian, Chelsea Finn, Sergey Levine

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we aim to answer the following questions: 1) How does our method compare to previous methods for learning goal-conditioned policies from offline data, such as goal-conditioned behavioral cloning and standard Q-learning with hindsight relabeling? 2) Can our method learn diverse skills on real robots with high-dimensional camera images? 3) Does our proposed goal chaining technique facilitate learning to reach long-horizon goals? 4) Is goal reaching an effective pre-training step or a suitable auxiliary objective for accelerating learning of downstream reward-driven skills?
Researcher Affiliation Collaboration 1Robotics at Google 2Carnegie Mellon University 3University of Southern California 4Stanford University 5UC Berkeley. Correspondence to: Yevgen Chebotar <chebotar@google.com>.
Pseudocode Yes Algorithm 1 Goal reaching with Actionable Models
Open Source Code No The paper does not provide a direct link to a source code repository or an explicit statement about releasing the code for the described methodology.
Open Datasets No The paper states: 'Our real robot training data comes from a wide range of reinforcement learning experiments conducted as part of another research project using the same platform (Kalashnikov et al., 2021).' and 'In our simulated experiments, we generate a dataset of trajectories from RL runs for the four tasks depicted in Fig. 6.' However, it does not provide concrete access information (link, DOI, repository) for these datasets.
Dataset Splits No The paper mentions evaluating models and training data, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes The data contains over 800,000 episodes, which altogether amounts to over 6 months of continuous robotic interaction on 7 KUKA robots.
Software Dependencies No The paper mentions using the 'QT-Opt framework' (Kalashnikov et al., 2018) but does not provide specific version numbers for this framework or other ancillary software components like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes The initial variances of the (x, y, z) end-effector positions and the azimuth angle during the CEM-optimization (Rubinstein & Kroese, 2004) are set to (3cm, 3cm, 6cm, 0.16 rad), respectively. We run CEM for 2 iterations with 64 samples per iteration and 10% elite percentile.