reproducibilityindex.ai

Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?

Authors: Xiang Li, Jinghuan Shang, Srijan Das, Michael Ryoo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct an extensive amount of experiments with various self-supervised losses. After evaluating these approaches together in multiple different environments including a real-world robot environment, we confirm that no single self-supervised loss or image augmentation method can dominate all environments and that the current framework for joint optimization of SSL and RL is limited. Finally, we conduct the ablation study on multiple factors and demonstrate the properties of representations learned with different approaches.
Researcher Affiliation	Academia	Department of Computer Science 1Stony Brook University, 2University of North Carolina at Charlotte
Pseudocode	Yes	The pseudo-code of SAC update alternating RL and SSL is provided in Algorithm 1.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Please check supplemental material
Open Datasets	Yes	DMControl (Deep Mind Control suite) [72] contains many challenging visual continuous control tasks, which are widely utilized by recent papers. Atari 2600 Games are also challenging benchmarks but with discrete action space [4].
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits. It describes training for a certain number of environment steps and evaluating performance, but does not define validation splits in the way typical for supervised learning datasets.
Hardware Specification	Yes	We utilize approximately 200 NVIDIA RTX 3090 GPUs to complete all experiments within three months.
Software Dependencies	No	The paper mentions various software components and frameworks used (e.g., SAC, Rainbow DQN, PyTorch implicitly as it's a deep learning paper), but it does not specify explicit version numbers for these software dependencies.
Experiment Setup	Yes	We mainly follow the hyper-parameters and the test environments reported in CURL, except that we use the same learning rate 10^-3 in all environments for simplicity. All the methods are benchmarked at 100k environment steps, with training batch size 512 under 10 random seeds, and they share the same capacity of policy network.