Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Authors: Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pretraining phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.
Researcher Affiliation Collaboration Michael Laskin UC Berkeley mlaskin@berkeley.edu Hao Liu UC Berkeley Xue Bin Peng UC Berkeley Denis Yarats NYU, Meta AI Aravind Rajeswaran UC Berkeley, Meta AI Pieter Abbeel UC Berkeley, Covariant
Pseudocode Yes 2 Py Torch -like pseudocode for the CIC loss [...] Listing 1: Pseudocode for the CIC loss.
Open Source Code Yes 1Project website and code: https://sites.google.com/view/cicneurips2022/
Open Datasets Yes We evaluate our approach on tasks from URLB, which consists of twelve downstream tasks across three challenging continuous control domains for exploration algorithms walker, quadruped, and Jaco arm. [...] All baselines were run for 10 seeds per downstream task for each algorithm using the code and hyperparameters provided by URLB [21].
Dataset Splits No The paper describes pre-training for 2M steps and finetuning for 100k steps in an RL environment, but it does not specify traditional train/validation/test dataset splits with percentages, sample counts, or predefined static splits.
Hardware Specification Yes Appendix L: Compute. All experiments were run on 4 Nvidia A100 GPUs.
Software Dependencies No The paper mentions 'All the experiments were run in Python 3.8.12' but does not list multiple key software components with their versions or provide specific version numbers for other major software like dm_control or PyTorch.
Experiment Setup Yes We fix the hyperparameters across all domains and downstream tasks. We refer the reader to the Appendices D and E for the full algorithm and a full list of hyperparameters." (Section 4, also confirmed by checking Appendices D and E which contain detailed hyperparameter tables).