reproducibilityindex.ai

Skew-Fit: State-Covering Self-Supervised Reinforcement Learning

Authors: Vitchyr Pong, Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, Sergey Levine

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that combining Skew-Fit for learning goal distributions with existing goal-reaching methods outperforms a variety of prior methods on open-sourced visual goal-reaching tasks and that Skew-Fit enables a real-world robot to learn to open a door, entirely from scratch, from pixels, and without any manually-designed reward function.
Researcher Affiliation	Academia	1University of California, Berkeley.
Pseudocode	Yes	Algorithm 1 Skew-Fit
Open Source Code	No	The paper does not provide a direct link to the source code for the methodology. It mentions: 'Videos of Skew-Fit solving this task and the simulated tasks can be viewed on our website.5 https://sites.google.com/view/skew-ﬁt' but this is for videos, not code.
Open Datasets	Yes	To our knowledge, these are the only goalconditioned, vision-based continuous control environments that are publicly available and experimentally evaluated in prior work, making them a good point of comparison.
Dataset Splits	No	The paper describes dynamic data collection for reinforcement learning rather than fixed dataset splits for training, validation, and testing as is common in supervised learning. It does not provide specific percentages or counts for training, validation, or test splits of any dataset.
Hardware Specification	No	The paper mentions running experiments on a 'real-world robot' but does not specify the hardware used (e.g., CPU, GPU models, memory, or specific computing infrastructure) for training or inference in any of its experiments.
Software Dependencies	No	The paper mentions software components like 'soft actor critic (SAC)' and 'β-VAE' but does not provide specific version numbers for any software libraries, frameworks, or environments used.
Experiment Setup	Yes	The β-VAE hyperparameters used to train q G φt are given in Appendix C.2. As seen in Figure 3, sampling uniformly from previous experience (α = 0) to set goals results in a policy that primarily sets goal near the initial state distribution. The β-VAE hyperparameters used to train q G φt are given in Appendix C.2. Implementation details of SAC and the prior works are given in Appendix C.3.