Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills
Authors: Yevgen Chebotar, Karol Hausman, Yao Lu, Ted Xiao, Dmitry Kalashnikov, Jacob Varley, Alex Irpan, Benjamin Eysenbach, Ryan C Julian, Chelsea Finn, Sergey Levine
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we aim to answer the following questions: 1) How does our method compare to previous methods for learning goal-conditioned policies from offline data, such as goal-conditioned behavioral cloning and standard Q-learning with hindsight relabeling? 2) Can our method learn diverse skills on real robots with high-dimensional camera images? 3) Does our proposed goal chaining technique facilitate learning to reach long-horizon goals? 4) Is goal reaching an effective pre-training step or a suitable auxiliary objective for accelerating learning of downstream reward-driven skills? |
| Researcher Affiliation | Collaboration | 1Robotics at Google 2Carnegie Mellon University 3University of Southern California 4Stanford University 5UC Berkeley. Correspondence to: Yevgen Chebotar <chebotar@google.com>. |
| Pseudocode | Yes | Algorithm 1 Goal reaching with Actionable Models |
| Open Source Code | No | The paper does not provide a direct link to a source code repository or an explicit statement about releasing the code for the described methodology. |
| Open Datasets | No | The paper states: 'Our real robot training data comes from a wide range of reinforcement learning experiments conducted as part of another research project using the same platform (Kalashnikov et al., 2021).' and 'In our simulated experiments, we generate a dataset of trajectories from RL runs for the four tasks depicted in Fig. 6.' However, it does not provide concrete access information (link, DOI, repository) for these datasets. |
| Dataset Splits | No | The paper mentions evaluating models and training data, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | The data contains over 800,000 episodes, which altogether amounts to over 6 months of continuous robotic interaction on 7 KUKA robots. |
| Software Dependencies | No | The paper mentions using the 'QT-Opt framework' (Kalashnikov et al., 2018) but does not provide specific version numbers for this framework or other ancillary software components like Python, PyTorch/TensorFlow, or CUDA. |
| Experiment Setup | Yes | The initial variances of the (x, y, z) end-effector positions and the azimuth angle during the CEM-optimization (Rubinstein & Kroese, 2004) are set to (3cm, 3cm, 6cm, 0.16 rad), respectively. We run CEM for 2 iterations with 64 samples per iteration and 10% elite percentile. |