Constrained Ensemble Exploration for Unsupervised Skill Discovery

Authors: Chenjia Bai, Rushuai Yang, Qiaosheng Zhang, Kang Xu, Yi Chen, Ting Xiao, Xuelong Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Based on extensive experiments on several challenging tasks, we find our method learns well-explored ensemble skills and achieves superior performance in various downstream tasks compared to previous methods. In this section, we compare the performance of unsupervised RL methods in challenging URLB tasks (Laskin et al., 2021). We also conduct experiments in a maze to illustrate the learned skills in a continuous 2D space. We finally conduct visualizations and ablation studies of our method.
Researcher Affiliation Collaboration 1Shanghai Artificial Intelligence Laboratory 2Shenzhen Research Institute of Northwestern Polytechnical University 3Hong Kong University of Science and Technology 4Tencent 5East China University of Science and Technology 6The Institute of Artificial Intelligence (Tele AI), China Telecom.
Pseudocode Yes We give algorithmic descriptions of the pretraining and finetuning stages in Algorithm 1 and Algorithm 2, respectively. (Algorithm 1: Unsupervised Pretraining of CeSD, Algorithm 2: Downstream Finetuning of CeSD)
Open Source Code Yes The open-sourced code is available at https://github.com/Baichenjia/CeSD.
Open Datasets Yes We evaluate CeSD in the URLB benchmark (Laskin et al., 2021). There are three domains (i.e., Walker, Quadruped, and Jaco), and each domain has four different downstream tasks. The environment is based on DMC (Tassa et al., 2018).
Dataset Splits No The paper specifies training steps (2M steps pre-training, 100K steps fine-tuning) and uses a mini-batch size of 1024, but it does not explicitly describe dataset splits for training, validation, or testing using percentages, sample counts, or references to predefined splits.
Hardware Specification No The paper mentions, "pretraining one seed of CeSD for 2M steps takes about 11 hours while fine-tuning downstream tasks for 100k steps takes about 20 minutes with a single 4090 GPU" as an example. However, it does not provide comprehensive or specific hardware specifications (e.g., CPU, memory, number of GPUs used for all experiments, cloud computing instance types) for running all experiments.
Software Dependencies No Table 1 lists hyperparameters for the Adam optimizer and agent update frequency, but the paper does not specify version numbers for core software components such as Python, PyTorch, TensorFlow, or CUDA libraries, which are essential for reproducibility.
Experiment Setup Yes In the unsupervised training stage, each method is trained for 2M steps with its intrinsic reward. Then we randomly sample a skill as the policy condition and fine-tune the policy for 100K steps in each downstream task for fast adaptation. We run 10 random seeds for each baseline... Table 1 summarizes the hyperparameters of our method and the basic DDPG algorithm for all methods, including replay buffer capacity (10^6), mini-batch size (1024), discount (0.99), and learning rate (10^-4).