Prototypical Networks for Few-shot Learning

Authors: Jake Snell, Kevin Swersky, Richard Zemel

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For few-shot learning, we performed experiments on Omniglot [18] and the mini Image Net version of ILSVRC-2012 [28] with the splits proposed by Ravi and Larochelle [24]. We perform zero-shot experiments on the 2011 version of the Caltech UCSD bird dataset (CUB-200 2011) [34].
Researcher Affiliation Collaboration Jake Snell University of Toronto Vector Institute Kevin Swersky Twitter Richard Zemel University of Toronto Vector Institute Canadian Institute for Advanced Research
Pseudocode Yes Algorithm 1 Training episode loss computation for Prototypical Networks.
Open Source Code No The paper does not contain any statement or link providing concrete access to the source code for the methodology described.
Open Datasets Yes For few-shot learning, we performed experiments on Omniglot [18] and the mini Image Net version of ILSVRC-2012 [28] with the splits proposed by Ravi and Larochelle [24]. We perform zero-shot experiments on the 2011 version of the Caltech UCSD bird dataset (CUB-200 2011) [34].
Dataset Splits Yes Their splits use a different set of 100 classes, divided into 64 training, 16 validation, and 20 test classes.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using 'Adam [13]' as the optimizer but does not specify version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes We used an initial learning rate of 10^-3 and cut the learning rate in half every 2000 episodes. We train using 30-way episodes for 1-shot classification and 20-way episodes for 5-shot classification. We match train shot to test shot and each class contains 15 query points per episode. Training episodes were constructed with 50 classes and 10 query images per class. The embeddings were optimized via SGD with Adam at a fixed learning rate of 10^-4 and weight decay of 10^-5.