Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

Authors: Vitchyr Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, Sergey Levine

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that combining Skew-Fit for learning goal distributions with existing goal-reaching methods outperforms a variety of prior methods on open-sourced visual goal-reaching tasks and that Skew-Fit enables a real-world robot to learn to open a door, entirely from scratch, from pixels, and without any manually-designed reward function.
Researcher Affiliation Academia 1University of California, Berkeley.
Pseudocode Yes Algorithm 1 Skew-Fit
Open Source Code No The paper does not provide a direct link to the source code for the methodology. It mentions: 'Videos of Skew-Fit solving this task and the simulated tasks can be viewed on our website.5 https://sites.google.com/view/skew-fit' but this is for videos, not code.
Open Datasets Yes To our knowledge, these are the only goalconditioned, vision-based continuous control environments that are publicly available and experimentally evaluated in prior work, making them a good point of comparison.
Dataset Splits No The paper describes dynamic data collection for reinforcement learning rather than fixed dataset splits for training, validation, and testing as is common in supervised learning. It does not provide specific percentages or counts for training, validation, or test splits of any dataset.
Hardware Specification No The paper mentions running experiments on a 'real-world robot' but does not specify the hardware used (e.g., CPU, GPU models, memory, or specific computing infrastructure) for training or inference in any of its experiments.
Software Dependencies No The paper mentions software components like 'soft actor critic (SAC)' and 'β-VAE' but does not provide specific version numbers for any software libraries, frameworks, or environments used.
Experiment Setup Yes The β-VAE hyperparameters used to train q G φt are given in Appendix C.2. As seen in Figure 3, sampling uniformly from previous experience (α = 0) to set goals results in a policy that primarily sets goal near the initial state distribution. The β-VAE hyperparameters used to train q G φt are given in Appendix C.2. Implementation details of SAC and the prior works are given in Appendix C.3.