Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Choreographer: Learning and Adapting Skills in Imagination
Authors: Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Alexandre Lacoste, Sai Rajeswar
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate Choreographer to answer the following questions: Does Choreographer learn and adapt skills effectively for unsupervised RL? (Section 4.1) We use the URL benchmark (Laskin et al., 2021) to show that, after pre-training on exploration data, our agent can adapt to several tasks in a data-efficient manner. We show this holds for both training on states and pixels inputs and in both offline and online settings. |
| Researcher Affiliation | Collaboration | 1 Ghent University imec 2 Service Now Research |
| Pseudocode | Yes | We reported in appendix the detailed pseudo-code for both Choreographer (Algorithm 2) and code-resampling (Algorithm 1). |
| Open Source Code | Yes | Project website: https://skillchoreographer.github.io/ [...] The code is publicly available through the project website. |
| Open Datasets | Yes | We use the URL benchmark (Laskin et al., 2021) to show that, after pre-training on exploration data, our agent can adapt to several tasks in a data-efficient manner. [...] In the URLB, there are three domains from the DM Control Suite (Tassa et al., 2018), Walker, Quadruped, and Jaco, with four downstream tasks to solve per each domain. [...] We zoom in on the Jaco sparse tasks from URLB and use sparse goal-reaching tasks from Meta-World (Yu et al., 2019) to evaluate the ability of our agent to find sparse rewards in the environment, by leveraging its skills. |
| Dataset Splits | Yes | The experimental setup consists of two phases: a longer data collection/pre-training phase, where the agent has up to 2M environment steps to interact with the environment without rewards, and a shorter fine-tuning phase, where the agent has 100k steps to interact with a task-specific version of the environment, where it should both discover rewards and solve a downstream task. |
| Hardware Specification | Yes | In the URLB paper, they claim a training time of about 12 hours for 2M frames pre-training using V100 GPUs for their model-free agents. Using the same GPUs, we can pre-train a Choreographer agent in about 44 hours. |
| Software Dependencies | No | The paper mentions software like 'Dreamer V2' and 'Adam' but does not specify version numbers for these or other key software dependencies (e.g., 'Python 3.8', 'PyTorch 1.9', 'CUDA 11.1'). |
| Experiment Setup | Yes | Implementation details and hyperparameters are provided in Appendix B and the open-source code is available on the project website. [...] We use Adam with learning rate 3 10 4 for the updates, and clipping gradients norm to 100. [...] The actor and critic networks are instantiated as 4-layer MLP with a dimensionality of 400 and updated using Adam with a learning rate of 8 10 5 and gradients clipped to 100. [...] The codebook consists of 64 codes of dimensionality 16. Code resampling happens every M = 200 training batches. |