Prototypical Networks for Few-shot Learning
Authors: Jake Snell, Kevin Swersky, Richard Zemel
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For few-shot learning, we performed experiments on Omniglot [18] and the mini Image Net version of ILSVRC-2012 [28] with the splits proposed by Ravi and Larochelle [24]. We perform zero-shot experiments on the 2011 version of the Caltech UCSD bird dataset (CUB-200 2011) [34]. |
| Researcher Affiliation | Collaboration | Jake Snell University of Toronto Vector Institute Kevin Swersky Twitter Richard Zemel University of Toronto Vector Institute Canadian Institute for Advanced Research |
| Pseudocode | Yes | Algorithm 1 Training episode loss computation for Prototypical Networks. |
| Open Source Code | No | The paper does not contain any statement or link providing concrete access to the source code for the methodology described. |
| Open Datasets | Yes | For few-shot learning, we performed experiments on Omniglot [18] and the mini Image Net version of ILSVRC-2012 [28] with the splits proposed by Ravi and Larochelle [24]. We perform zero-shot experiments on the 2011 version of the Caltech UCSD bird dataset (CUB-200 2011) [34]. |
| Dataset Splits | Yes | Their splits use a different set of 100 classes, divided into 64 training, 16 validation, and 20 test classes. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Adam [13]' as the optimizer but does not specify version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We used an initial learning rate of 10^-3 and cut the learning rate in half every 2000 episodes. We train using 30-way episodes for 1-shot classification and 20-way episodes for 5-shot classification. We match train shot to test shot and each class contains 15 query points per episode. Training episodes were constructed with 50 classes and 10 query images per class. The embeddings were optimized via SGD with Adam at a fixed learning rate of 10^-4 and weight decay of 10^-5. |