Meta-Learning with Latent Embedding Optimization

Authors: Andrei A. Rusu, Dushyant Rao, Jakub Sygnowski, Oriol Vinyals, Razvan Pascanu, Simon Osindero, Raia Hadsell

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation shows that LEO can achieve state-of-the-art performance on the competitive mini Image Net and tiered Image Net few-shot classification tasks. Further analysis indicates LEO is able to capture uncertainty in the data, and can perform adaptation more effectively by optimizing in latent space. We demonstrate that LEO achieves state-of-the-art results on both the mini Image Net and tiered Image Net datasets, and run an ablation study and further analysis to show that both conditional parameter generation and optimization in latent space are critical for the success of the method.
Researcher Affiliation Industry Deep Mind, London, UK {andreirusu, dushyantr, sygi, vinyals, razp, osindero, raia}@google.com
Pseudocode Yes Algorithm 1 Latent Embedding Optimization
Open Source Code Yes Source code for our experiments is available at https://github.com/deepmind/leo.
Open Datasets Yes The mini Image Net dataset (Vinyals et al., 2016) is a subset of 100 classes selected randomly from the ILSVRC-12 dataset (Russakovsky et al., 2014)... The tiered Image Net dataset (Ren et al., 2018) is a larger subset of ILSVRC-12 with 608 classes...
Dataset Splits Yes We define the N-way K-shot problem using the episodic formulation of Vinyals et al. (2016). Each task instance Ti is a classification problem sampled from a task distribution p(T ). The tasks are divided into a training meta-set Str, validation meta-set Sval, and test meta-set Stest, each with a disjoint set of target classes... The validation meta-set is used for model selection, and the testing meta-set is used only for final evaluation. Each task instance Ti p (T ) is composed of a training set Dtr and validation set Dval...
Hardware Specification No Training of the image extractor was more compute-intensive, taking 5 hours for mini Image Net and around a day for tiered Image Net using 32 GPUs. This mentions the number of GPUs but not specific models (e.g., NVIDIA V100, A100, etc.), which is not specific enough for full reproducibility.
Software Dependencies No The paper mentions using Adam (Kingma & Ba, 2014) for optimization, but does not provide specific version numbers for any software libraries, frameworks (like TensorFlow or PyTorch), or programming languages used.
Experiment Setup Yes We used a 3-layer MLP as the underlying model architecture of fθ... The encoder was a 3-layer MLP with 32 units per layer... The relation network and decoder were both 3-layer MLPs with 32 units per layer. (Appendix A.2) Within the LEO inner loop we perform 5 steps of adaptation in latent space, followed by 5 steps of fine-tuning in parameter space. The learning rates for these spaces were meta-learned... after being initialized to 1 and 0.001 for the latent and parameter spaces respectively. (Appendix B.4) Table 6: Values of hyperparameters chosen to maximize meta-validation accuracy during random search. Lists specific values for η, γ, β, λ1, λ2, and pkeep.