Continuous-Time Meta-Learning with Forward Mode Differentiation
Authors: Tristan Deleu, David Kanaa, Leo Feng, Giancarlo Kerg, Yoshua Bengio, Guillaume Lajoie, Pierre-Luc Bacon
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically its efficiency in terms of runtime and memory usage, and we illustrate its effectiveness on a range of few-shot image classification problems. |
| Researcher Affiliation | Academia | Tristan Deleu David Kanaa Leo Feng Giancarlo Kerg Yoshua Bengio 1,2 Guillaume Lajoie 2 Pierre-Luc Bacon 2 Mila Université de Montréal 1CIFAR Senior Fellow, 2CIFAR AI Chair |
| Pseudocode | Yes | We give in Algorithm 1 the pseudo-code for meta-training COMLN, based on a distribution of tasks p(τ), with references to the relevant propositions developed in Appendices B and C. |
| Open Source Code | Yes | Code is available at: https://github.com/tristandeleu/jax-comln |
| Open Datasets | Yes | We evaluate COMLN on two standard few-shot image classification benchmarks: the mini Image Net (Vinyals et al., 2016) and the tiered Image Net datasets (Ren et al., 2018), both datasets being derived from ILSVRC-2012 (Russakovsky et al., 2015). |
| Dataset Splits | Yes | mini Imagenet consists of 100 classes, split into 64 training classes, 16 validation classes, and 20 test classes. |
| Hardware Specification | Yes | The extrapolated dashed lines correspond to the method reaching the memory capacity of a Tesla V100 GPU with 32Gb of memory. |
| Software Dependencies | No | The paper mentions 'JAX (Bradbury et al., 2018)' and 'Haiku (Hennigan et al., 2020)', but it does not specify version numbers for these software dependencies. It also refers to a '4th order Runge-Kutta method' which is a type of numerical solver, but no specific software library version for its implementation. |
| Experiment Setup | Yes | To compute the adapted parameters and the meta-gradients in COMLN, we integrate the dynamical system described in Section 4.2 with a 4th order Runge-Kutta method with a Dormand Prince adaptive step size... Furthermore to ensure that T > 0, we parametrized it with an exponential activation... For all methods and all datasets, we used SGD with momentum 0.9 and Nesterov acceleration, with a decreasing learning rate starting at 0.1 and decreasing according to the schedule provided by Lee et al. (2019). |