Recasting Gradient-Based Meta-Learning as Hierarchical Bayes

Authors: Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, Thomas Griffiths

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTAL EVALUATION The goal of our experiments is to evaluate if we can use our probabilistic interpretation of MAML to generate samples from the distribution over adapted parameters, and futhermore, if our method can be applied to large-scale meta-learning problems such as mini Image Net. Table 1: One-shot classification performance on the mini Image Net test set, with comparison methods ordered by one-shot performance. All results are averaged over 600 test episodes, and we report 95% confidence intervals.
Researcher Affiliation Academia 1 Berkeley AI Research (BAIR), University of California, Berkeley 2 Department of Electrical Engineering & Computer Sciences, University of California, Berkeley 3 Department of Psychology, University of California, Berkeley
Pseudocode Yes Algorithm 2: Model-agnostic meta-learning as hierarchical Bayesian inference. Subroutine 3: Subroutine for computing a point estimate ˆφ using truncated gradient descent to approximate the marginal negative log likelihood (NLL). Subroutine 4: Subroutine for computing a Laplace approximation of the marginal likelihood.
Open Source Code No The paper does not provide any explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We evaluate LLAMA on the mini Image Net Ravi & Larochelle (2017) 1-shot, 5-way classification task, a standard benchmark in few-shot classification.
Dataset Splits Yes mini Image Net comprises 64 training classes, 12 validation classes, and 24 test classes. During training and for each task, 10 input datapoints are sampled uniformly from [ 10.0, 10.0] and the loss is the mean squared error between the prediction and the true value.
Hardware Specification Yes In particular, our Tensor Flow implementation of LLAMA trains for 60,000 iterations on one TITAN Xp GPU in 9 hours, compared to 5 hours to train MAML.
Software Dependencies No The paper mentions "Tensor Flow implementation" but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes We use Adam (Kingma & Ba, 2014) as the meta-optimizer, and standard batch gradient descent with a fixed learning rate to update the model during fast adaptation. LLAMA requires the prior precision term τ as well as an additional parameter η R+ that weights the regularization term log det ˆH contributed by the Laplace approximation. We fix τ = 0.001 and selected η = 10 6 via cross-validation; all other parameters are set to the values reported in Finn et al. (2017).