Bootstrapped Meta-Learning
Authors: Sebastian Flennerhag, Yannick Schroecker, Tom Zahavy, Hado van Hasselt, David Silver, Satinder Singh
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that BMG provides substantial performance improvements over standard meta-gradients in various settings. We obtain a new state-of-the-art result for model-free agents on Atari (Section 5.2) and improve upon MAML (Finn et al., 2017) in the few-shot setting (Section 6). |
| Researcher Affiliation | Industry | Sebastian Flennerhag Deep Mind flennerhag@google.com Yannick Schroecker Deep Mind Tom Zahavy Deep Mind Hado van Hasselt Deep Mind David Silver Deep Mind Satinder Singh Deep Mind |
| Pseudocode | Yes | Algorithm 1 N-step RL actor loop |
| Open Source Code | No | No explicit statement about the authors' own source code being released or available through a link. |
| Open Datasets | Yes | Mini Imagenet (Vinyals et al., 2016; Ravi & Larochelle, 2017) is a sub-sample of the Imagenet dataset (Deng et al., 2009). Specifically, it is a subset of 100 classes sampled randomly from the 1000 classes in the ILSVRC-12 training set, with 600 images for each class. We follow the standard protocol (Ravi & Larochelle, 2017) and split classes into a non-overlapping meta-training, meta-validation, and meta-tests sets with 64, 16, and 20 classes in each, respectively. The datasset is licenced under the MIT licence and the ILSVRC licence. The dataset can be obtained from https://paperswithcode. com/dataset/miniimagenet-1. |
| Dataset Splits | Yes | We follow the standard protocol (Ravi & Larochelle, 2017) and split classes into a non-overlapping meta-training, meta-validation, and meta-tests sets with 64, 16, and 20 classes in each, respectively. |
| Hardware Specification | Yes | IMPALA s distributed setup is implemented on a single machine with 56 CPU cores and 8 TPU (Jouppi et al., 2017) cores. 2 TPU cores are used to act in 48 environments asynchronously in parallel, sending rollouts to a replay buffer that a centralized learner use to update agent parameters and meta-parameters. Gradient computations are distributed along the batch dimension across the remaining 6 TPU cores. All Atari experiments use this setup; training for 200 millions frames takes 24 hours. Each model is trained on a single machine and runs on a V100 NVIDIA GPU. |
| Software Dependencies | No | No specific version numbers for software dependencies (e.g., Python, PyTorch, TensorFlow, etc.) are provided. |
| Experiment Setup | Yes | Table 1: Two-colors hyper-parameters and Table 2: Atari hyper-parameters contain detailed lists of hyperparameters for optimizers, network architectures, and RL-specific parameters. For instance: Optimiser SGD Learning rate 0.1 Batch size 16 (losses are averaged) γ 0.99 for Actor-critic Inner Learner. |