Skew-Fit: State-Covering Self-Supervised Reinforcement Learning
Authors: Vitchyr Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, Sergey Levine
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that combining Skew-Fit for learning goal distributions with existing goal-reaching methods outperforms a variety of prior methods on open-sourced visual goal-reaching tasks and that Skew-Fit enables a real-world robot to learn to open a door, entirely from scratch, from pixels, and without any manually-designed reward function. |
| Researcher Affiliation | Academia | 1University of California, Berkeley. |
| Pseudocode | Yes | Algorithm 1 Skew-Fit |
| Open Source Code | No | The paper does not provide a direct link to the source code for the methodology. It mentions: 'Videos of Skew-Fit solving this task and the simulated tasks can be viewed on our website.5 https://sites.google.com/view/skew-fit' but this is for videos, not code. |
| Open Datasets | Yes | To our knowledge, these are the only goalconditioned, vision-based continuous control environments that are publicly available and experimentally evaluated in prior work, making them a good point of comparison. |
| Dataset Splits | No | The paper describes dynamic data collection for reinforcement learning rather than fixed dataset splits for training, validation, and testing as is common in supervised learning. It does not provide specific percentages or counts for training, validation, or test splits of any dataset. |
| Hardware Specification | No | The paper mentions running experiments on a 'real-world robot' but does not specify the hardware used (e.g., CPU, GPU models, memory, or specific computing infrastructure) for training or inference in any of its experiments. |
| Software Dependencies | No | The paper mentions software components like 'soft actor critic (SAC)' and 'β-VAE' but does not provide specific version numbers for any software libraries, frameworks, or environments used. |
| Experiment Setup | Yes | The β-VAE hyperparameters used to train q G φt are given in Appendix C.2. As seen in Figure 3, sampling uniformly from previous experience (α = 0) to set goals results in a policy that primarily sets goal near the initial state distribution. The β-VAE hyperparameters used to train q G φt are given in Appendix C.2. Implementation details of SAC and the prior works are given in Appendix C.3. |