Behavior Contrastive Learning for Unsupervised Skill Discovery
Authors: Rushuai Yang, Chenjia Bai, Hongyi Guo, Siyuan Li, Bin Zhao, Zhen Wang, Peng Liu, Xuelong Li
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on challenging mazes and continuous control tasks. The results show that our method generates diverse and far-reaching skills, and also obtains competitive performance in downstream tasks compared to the state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Shanghai Artificial Intelligence Laboratory, China 2Harbin Institute of Technology, China 3Northwestern University, USA 4Northwestern Polytechnical University, China. |
| Pseudocode | Yes | Algorithm 1 Be CL: Unsupervised pretraining and Algorithm 2 Be CL: Finetuning with extrinsic rewards are provided in Appendix C.3. |
| Open Source Code | Yes | The open-sourced code is available at https://github.com/Rooshy-yang/Be CL. |
| Open Datasets | Yes | We evaluate Be CL in DMC tasks from URLB benchmark (Laskin et al., 2021), which needs to discover more complicated skills to achieve the desired behavior. |
| Dataset Splits | No | The paper describes pretraining (2M steps) and finetuning (100K steps) on different downstream tasks for evaluation. However, it does not specify a distinct 'validation' dataset split with percentages or counts for model selection or hyperparameter tuning, which is typically found in supervised learning contexts for reproducibility of data partitioning. |
| Hardware Specification | Yes | pretraining one seed of Be CL for 2M steps takes about 18 hours while fine-tuning to downstream tasks for 100k steps takes about 30 minutes with a A100 GPU. |
| Software Dependencies | No | The paper mentions 'DDPG' as the basic RL algorithm and 'Adam' as the optimizer, but it does not provide specific version numbers for these or other software libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | Table 1. Hyper-parameters used for Be CL and DDPG. This table lists detailed parameters such as 'Skill dim 16 discrete', 'Temperature κ 0.5', 'Replay buffer capacity 10^6', 'Mini-batch size 1024', 'Discount (γ) 0.99', 'Learning rate 10^-4', 'Number pretraining frames 2x10^6', and 'Number fineturning frames 1x10^5'. |