Choreographer: Learning and Adapting Skills in Imagination

Authors: Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Alexandre Lacoste, Sai Rajeswar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate Choreographer to answer the following questions: Does Choreographer learn and adapt skills effectively for unsupervised RL? (Section 4.1) We use the URL benchmark (Laskin et al., 2021) to show that, after pre-training on exploration data, our agent can adapt to several tasks in a data-efficient manner. We show this holds for both training on states and pixels inputs and in both offline and online settings.
Researcher Affiliation Collaboration 1 Ghent University imec 2 Service Now Research
Pseudocode Yes We reported in appendix the detailed pseudo-code for both Choreographer (Algorithm 2) and code-resampling (Algorithm 1).
Open Source Code Yes Project website: https://skillchoreographer.github.io/ [...] The code is publicly available through the project website.
Open Datasets Yes We use the URL benchmark (Laskin et al., 2021) to show that, after pre-training on exploration data, our agent can adapt to several tasks in a data-efficient manner. [...] In the URLB, there are three domains from the DM Control Suite (Tassa et al., 2018), Walker, Quadruped, and Jaco, with four downstream tasks to solve per each domain. [...] We zoom in on the Jaco sparse tasks from URLB and use sparse goal-reaching tasks from Meta-World (Yu et al., 2019) to evaluate the ability of our agent to find sparse rewards in the environment, by leveraging its skills.
Dataset Splits Yes The experimental setup consists of two phases: a longer data collection/pre-training phase, where the agent has up to 2M environment steps to interact with the environment without rewards, and a shorter fine-tuning phase, where the agent has 100k steps to interact with a task-specific version of the environment, where it should both discover rewards and solve a downstream task.
Hardware Specification Yes In the URLB paper, they claim a training time of about 12 hours for 2M frames pre-training using V100 GPUs for their model-free agents. Using the same GPUs, we can pre-train a Choreographer agent in about 44 hours.
Software Dependencies No The paper mentions software like 'Dreamer V2' and 'Adam' but does not specify version numbers for these or other key software dependencies (e.g., 'Python 3.8', 'PyTorch 1.9', 'CUDA 11.1').
Experiment Setup Yes Implementation details and hyperparameters are provided in Appendix B and the open-source code is available on the project website. [...] We use Adam with learning rate 3 10 4 for the updates, and clipping gradients norm to 100. [...] The actor and critic networks are instantiated as 4-layer MLP with a dimensionality of 400 and updated using Adam with a learning rate of 8 10 5 and gradients clipped to 100. [...] The codebook consists of 64 codes of dimensionality 16. Code resampling happens every M = 200 training batches.