Behavior Contrastive Learning for Unsupervised Skill Discovery

Authors: Rushuai Yang, Chenjia Bai, Hongyi Guo, Siyuan Li, Bin Zhao, Zhen Wang, Peng Liu, Xuelong Li

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on challenging mazes and continuous control tasks. The results show that our method generates diverse and far-reaching skills, and also obtains competitive performance in downstream tasks compared to the state-of-the-art methods.
Researcher Affiliation Academia 1Shanghai Artificial Intelligence Laboratory, China 2Harbin Institute of Technology, China 3Northwestern University, USA 4Northwestern Polytechnical University, China.
Pseudocode Yes Algorithm 1 Be CL: Unsupervised pretraining and Algorithm 2 Be CL: Finetuning with extrinsic rewards are provided in Appendix C.3.
Open Source Code Yes The open-sourced code is available at https://github.com/Rooshy-yang/Be CL.
Open Datasets Yes We evaluate Be CL in DMC tasks from URLB benchmark (Laskin et al., 2021), which needs to discover more complicated skills to achieve the desired behavior.
Dataset Splits No The paper describes pretraining (2M steps) and finetuning (100K steps) on different downstream tasks for evaluation. However, it does not specify a distinct 'validation' dataset split with percentages or counts for model selection or hyperparameter tuning, which is typically found in supervised learning contexts for reproducibility of data partitioning.
Hardware Specification Yes pretraining one seed of Be CL for 2M steps takes about 18 hours while fine-tuning to downstream tasks for 100k steps takes about 30 minutes with a A100 GPU.
Software Dependencies No The paper mentions 'DDPG' as the basic RL algorithm and 'Adam' as the optimizer, but it does not provide specific version numbers for these or other software libraries like Python, PyTorch, or TensorFlow.
Experiment Setup Yes Table 1. Hyper-parameters used for Be CL and DDPG. This table lists detailed parameters such as 'Skill dim 16 discrete', 'Temperature κ 0.5', 'Replay buffer capacity 10^6', 'Mini-batch size 1024', 'Discount (γ) 0.99', 'Learning rate 10^-4', 'Number pretraining frames 2x10^6', and 'Number fineturning frames 1x10^5'.