Unsupervised Reinforcement Learning with Contrastive Intrinsic Control
Authors: Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pretraining phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB. |
| Researcher Affiliation | Collaboration | Michael Laskin UC Berkeley mlaskin@berkeley.edu Hao Liu UC Berkeley Xue Bin Peng UC Berkeley Denis Yarats NYU, Meta AI Aravind Rajeswaran UC Berkeley, Meta AI Pieter Abbeel UC Berkeley, Covariant |
| Pseudocode | Yes | 2 Py Torch -like pseudocode for the CIC loss [...] Listing 1: Pseudocode for the CIC loss. |
| Open Source Code | Yes | 1Project website and code: https://sites.google.com/view/cicneurips2022/ |
| Open Datasets | Yes | We evaluate our approach on tasks from URLB, which consists of twelve downstream tasks across three challenging continuous control domains for exploration algorithms walker, quadruped, and Jaco arm. [...] All baselines were run for 10 seeds per downstream task for each algorithm using the code and hyperparameters provided by URLB [21]. |
| Dataset Splits | No | The paper describes pre-training for 2M steps and finetuning for 100k steps in an RL environment, but it does not specify traditional train/validation/test dataset splits with percentages, sample counts, or predefined static splits. |
| Hardware Specification | Yes | Appendix L: Compute. All experiments were run on 4 Nvidia A100 GPUs. |
| Software Dependencies | No | The paper mentions 'All the experiments were run in Python 3.8.12' but does not list multiple key software components with their versions or provide specific version numbers for other major software like dm_control or PyTorch. |
| Experiment Setup | Yes | We fix the hyperparameters across all domains and downstream tasks. We refer the reader to the Appendices D and E for the full algorithm and a full list of hyperparameters." (Section 4, also confirmed by checking Appendices D and E which contain detailed hyperparameter tables). |