Behaviour Distillation

Authors: Andrei Lupu, Chris Lu, Jarek Luca Liesen, Robert Tjarko Lange, Jakob Nicolaus Foerster

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate empirically that Ha DES can produce effective synthetic datasets for challenging discrete and continuous control environments that generalize to training policies with a large range of architectures and hyperparameters (Section 5.2); We use the synthetic datasets for a downstream task: quickly training a multi-task agent from datasets produced for individual environments (Section 5.3); We achieve So TA for a common dataset distillation benchmark with Ha DES (Section 5.4);
Researcher Affiliation Collaboration Andrei Lupu1,2, Chris Lu1, Jarek Liesen3, Robert Tjarko Lange3 & Jakob Foerster1 1University of Oxford 2Meta AI 3Technical University Berlin
Pseudocode Yes We provide pseudocode in Algorithm 1. Algorithm 1 Ha DES
Open Source Code Yes We open-source our code and synthetic datasets under https://github.com/ FLAIROx/behaviour-distillation. We also open-source our code and our synthetic datasets at https: //github.com/FLAIROx/behaviour-distillation.
Open Datasets Yes For all RL tasks we use Brax (Freeman et al., 2021), a suite of continuous control environments, and Min Atar (Young & Tian, 2019), a set of Atari-like environments. For dataset distillation, we report results on two image classification tasks: MNIST (Le Cun, 1998), which is composed of handwritten digits, and Fashion MNIST (Xiao et al., 2017), which features different clothing items.
Dataset Splits No The paper does not explicitly state training, validation, and test splits (e.g., 80/10/10%). While it mentions a 'test set', it does not provide details on how the data was partitioned for training and validation.
Hardware Specification Yes All of our runs use 8 Nvidia V100 GPUs and take between 1 and 17 seconds per outer loop generation.
Software Dependencies No The paper states: "We implement our algorithm in JAX (Bradbury et al., 2018) using the Pure Jax RL (Lu et al., 2022), gymnax (Lange, 2022) and evosax (Lange, 2023) libraries". While the libraries are named, specific version numbers for these libraries (e.g., JAX 0.X.Y) are not provided, only the year of the accompanying publication.
Experiment Setup Yes Table 2: Hyperparameters for Ha DES in Brax. Top: inner loop parameters. Bottom: Outer loop parameters. Table 3: Hyperparameters for Ha DES in Min Atar. Top: inner loop parameters. Bottom: Outer loop parameters.