reproducibilityindex.ai

Unsupervised Reinforcement Learning with Contrastive Intrinsic Control

Authors: Michael Laskin, Hao Liu, Xue Bin Peng, Denis Yarats, Aravind Rajeswaran, Pieter Abbeel

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our algorithm on the Unsupervised RL Benchmark (URLB) in the asymptotic state-based setting, which consists of a long reward-free pretraining phase followed by a short adaptation phase to downstream tasks with extrinsic rewards. We find that CIC improves over prior exploration algorithms in terms of adaptation efficiency to downstream tasks on state-based URLB.
Researcher Affiliation	Collaboration	Michael Laskin UC Berkeley mlaskin@berkeley.edu Hao Liu UC Berkeley Xue Bin Peng UC Berkeley Denis Yarats NYU, Meta AI Aravind Rajeswaran UC Berkeley, Meta AI Pieter Abbeel UC Berkeley, Covariant
Pseudocode	Yes	2 Py Torch -like pseudocode for the CIC loss [...] Listing 1: Pseudocode for the CIC loss.
Open Source Code	Yes	1Project website and code: https://sites.google.com/view/cicneurips2022/
Open Datasets	Yes	We evaluate our approach on tasks from URLB, which consists of twelve downstream tasks across three challenging continuous control domains for exploration algorithms walker, quadruped, and Jaco arm. [...] All baselines were run for 10 seeds per downstream task for each algorithm using the code and hyperparameters provided by URLB [21].
Dataset Splits	No	The paper describes pre-training for 2M steps and finetuning for 100k steps in an RL environment, but it does not specify traditional train/validation/test dataset splits with percentages, sample counts, or predefined static splits.
Hardware Specification	Yes	Appendix L: Compute. All experiments were run on 4 Nvidia A100 GPUs.
Software Dependencies	No	The paper mentions 'All the experiments were run in Python 3.8.12' but does not list multiple key software components with their versions or provide specific version numbers for other major software like dm_control or PyTorch.
Experiment Setup	Yes	We fix the hyperparameters across all domains and downstream tasks. We refer the reader to the Appendices D and E for the full algorithm and a full list of hyperparameters." (Section 4, also confirmed by checking Appendices D and E which contain detailed hyperparameter tables).