Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?

Authors: Xiang Li, Jinghuan Shang, Srijan Das, Michael Ryoo

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an extensive amount of experiments with various self-supervised losses. After evaluating these approaches together in multiple different environments including a real-world robot environment, we confirm that no single self-supervised loss or image augmentation method can dominate all environments and that the current framework for joint optimization of SSL and RL is limited. Finally, we conduct the ablation study on multiple factors and demonstrate the properties of representations learned with different approaches.
Researcher Affiliation Academia Department of Computer Science 1Stony Brook University, 2University of North Carolina at Charlotte
Pseudocode Yes The pseudo-code of SAC update alternating RL and SSL is provided in Algorithm 1.
Open Source Code Yes Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] Please check supplemental material
Open Datasets Yes DMControl (Deep Mind Control suite) [72] contains many challenging visual continuous control tasks, which are widely utilized by recent papers. Atari 2600 Games are also challenging benchmarks but with discrete action space [4].
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits. It describes training for a certain number of environment steps and evaluating performance, but does not define validation splits in the way typical for supervised learning datasets.
Hardware Specification Yes We utilize approximately 200 NVIDIA RTX 3090 GPUs to complete all experiments within three months.
Software Dependencies No The paper mentions various software components and frameworks used (e.g., SAC, Rainbow DQN, PyTorch implicitly as it's a deep learning paper), but it does not specify explicit version numbers for these software dependencies.
Experiment Setup Yes We mainly follow the hyper-parameters and the test environments reported in CURL, except that we use the same learning rate 10^-3 in all environments for simplicity. All the methods are benchmarked at 100k environment steps, with training batch size 512 under 10 random seeds, and they share the same capacity of policy network.