Recasting Gradient-Based Meta-Learning as Hierarchical Bayes
Authors: Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, Thomas Griffiths
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTAL EVALUATION The goal of our experiments is to evaluate if we can use our probabilistic interpretation of MAML to generate samples from the distribution over adapted parameters, and futhermore, if our method can be applied to large-scale meta-learning problems such as mini Image Net. Table 1: One-shot classification performance on the mini Image Net test set, with comparison methods ordered by one-shot performance. All results are averaged over 600 test episodes, and we report 95% confidence intervals. |
| Researcher Affiliation | Academia | 1 Berkeley AI Research (BAIR), University of California, Berkeley 2 Department of Electrical Engineering & Computer Sciences, University of California, Berkeley 3 Department of Psychology, University of California, Berkeley |
| Pseudocode | Yes | Algorithm 2: Model-agnostic meta-learning as hierarchical Bayesian inference. Subroutine 3: Subroutine for computing a point estimate ˆφ using truncated gradient descent to approximate the marginal negative log likelihood (NLL). Subroutine 4: Subroutine for computing a Laplace approximation of the marginal likelihood. |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the described methodology. |
| Open Datasets | Yes | We evaluate LLAMA on the mini Image Net Ravi & Larochelle (2017) 1-shot, 5-way classification task, a standard benchmark in few-shot classification. |
| Dataset Splits | Yes | mini Image Net comprises 64 training classes, 12 validation classes, and 24 test classes. During training and for each task, 10 input datapoints are sampled uniformly from [ 10.0, 10.0] and the loss is the mean squared error between the prediction and the true value. |
| Hardware Specification | Yes | In particular, our Tensor Flow implementation of LLAMA trains for 60,000 iterations on one TITAN Xp GPU in 9 hours, compared to 5 hours to train MAML. |
| Software Dependencies | No | The paper mentions "Tensor Flow implementation" but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We use Adam (Kingma & Ba, 2014) as the meta-optimizer, and standard batch gradient descent with a fixed learning rate to update the model during fast adaptation. LLAMA requires the prior precision term τ as well as an additional parameter η R+ that weights the regularization term log det ˆH contributed by the Laplace approximation. We fix τ = 0.001 and selected η = 10 6 via cross-validation; all other parameters are set to the values reported in Finn et al. (2017). |