reproducibilityindex.ai

Constrained Ensemble Exploration for Unsupervised Skill Discovery

Authors: Chenjia Bai, Rushuai Yang, Qiaosheng Zhang, Kang Xu, Yi Chen, Ting Xiao, Xuelong Li

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on extensive experiments on several challenging tasks, we find our method learns well-explored ensemble skills and achieves superior performance in various downstream tasks compared to previous methods. In this section, we compare the performance of unsupervised RL methods in challenging URLB tasks (Laskin et al., 2021). We also conduct experiments in a maze to illustrate the learned skills in a continuous 2D space. We finally conduct visualizations and ablation studies of our method.
Researcher Affiliation	Collaboration	1Shanghai Artificial Intelligence Laboratory 2Shenzhen Research Institute of Northwestern Polytechnical University 3Hong Kong University of Science and Technology 4Tencent 5East China University of Science and Technology 6The Institute of Artificial Intelligence (Tele AI), China Telecom.
Pseudocode	Yes	We give algorithmic descriptions of the pretraining and finetuning stages in Algorithm 1 and Algorithm 2, respectively. (Algorithm 1: Unsupervised Pretraining of CeSD, Algorithm 2: Downstream Finetuning of CeSD)
Open Source Code	Yes	The open-sourced code is available at https://github.com/Baichenjia/CeSD.
Open Datasets	Yes	We evaluate CeSD in the URLB benchmark (Laskin et al., 2021). There are three domains (i.e., Walker, Quadruped, and Jaco), and each domain has four different downstream tasks. The environment is based on DMC (Tassa et al., 2018).
Dataset Splits	No	The paper specifies training steps (2M steps pre-training, 100K steps fine-tuning) and uses a mini-batch size of 1024, but it does not explicitly describe dataset splits for training, validation, or testing using percentages, sample counts, or references to predefined splits.
Hardware Specification	No	The paper mentions, "pretraining one seed of CeSD for 2M steps takes about 11 hours while fine-tuning downstream tasks for 100k steps takes about 20 minutes with a single 4090 GPU" as an example. However, it does not provide comprehensive or specific hardware specifications (e.g., CPU, memory, number of GPUs used for all experiments, cloud computing instance types) for running all experiments.
Software Dependencies	No	Table 1 lists hyperparameters for the Adam optimizer and agent update frequency, but the paper does not specify version numbers for core software components such as Python, PyTorch, TensorFlow, or CUDA libraries, which are essential for reproducibility.
Experiment Setup	Yes	In the unsupervised training stage, each method is trained for 2M steps with its intrinsic reward. Then we randomly sample a skill as the policy condition and fine-tune the policy for 100K steps in each downstream task for fast adaptation. We run 10 random seeds for each baseline... Table 1 summarizes the hyperparameters of our method and the basic DDPG algorithm for all methods, including replay buffer capacity (10^6), mini-batch size (1024), discount (0.99), and learning rate (10^-4).