Optimization as a Model for Few-Shot Learning
Authors: Sachin Ravi, Hugo Larochelle
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that this meta-learning model is competitive with deep metric-learning techniques for few-shot learning.In this section, we describe the results of experiments, examining the properties of our model and comparing our method s performance against different approaches . |
| Researcher Affiliation | Collaboration | Sachin Ravi and Hugo Larochelle Twitter, Cambridge, USA {sachinr,hugo}@twitter.comWork done as an intern at Twitter. Sachin is a Ph D student at Princeton University and can be reached at sachinr@princeton.edu. |
| Pseudocode | Yes | Algorithm 1 Train Meta-Learner |
| Open Source Code | Yes | Code can be found at https://github.com/twitter/meta-learning-lstm. |
| Open Datasets | Yes | The Mini-Image Net dataset was proposed by Vinyals et al. (2016) as a benchmark offering the challenges of the complexity of Image Net images, without requiring the resources and infrastructure necessary to run on the full Image Net dataset. |
| Dataset Splits | Yes | We use 64, 16, and 20 classes for training, validation and testing, respectively. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models or processor types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using ADAM for optimization but does not provide specific version numbers for software dependencies like programming languages or libraries. |
| Experiment Setup | Yes | For the learner, we use a simple CNN containing 4 convolutional layers, each of which is a 3 3 convolution with 32 filters, followed by batch normalization, a Re LU non-linearity, and lastly a 2 2 max-pooling. The network then has a final linear layer followed by a softmax for the number of classes being considered. We train our LSTM with ADAM using a learning rate of 0.001 and with gradient clipping using a value of 0.25. |