A Theoretical Analysis of the Number of Shots in Few-Shot Learning
Authors: Tianshi Cao, Marc T Law, Sanja Fidler
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce a theoretical analysis of the impact of the shot number on Prototypical Networks, a state-of-the-art few-shot classification method. From our analysis, we propose a simple method that is robust to the choice of shot number used during meta-training, which is a crucial hyperparameter. The performance of our model trained for an arbitrary meta-training shot number shows great performance for different values of meta-testing shot numbers. We experimentally demonstrate our approach on different few-shot classification benchmarks. |
| Researcher Affiliation | Collaboration | Tianshi Cao1,2, Marc T. Law1,2,3, Sanja Fidler1,2,3 1 Department of Computer Science, University of Toronto 2 Vector Institute 3 NVIDIA |
| Pseudocode | Yes | A.1 ALGORITHM FOR EST Algorithm 1 Algorithm for computing the transformation T. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Experiments are performed on three data sets: Omniglot (Lake et al., 2015), mini Image Net (Vinyals et al., 2016), and tiered Image Net (Ren et al., 2018). |
| Dataset Splits | Yes | For mini Image Net experiments, we use the splits proposed by (Ravi & Larochelle, 2017) where 64 classes are used for training, 16 for validation, and 20 for testing. Tiered Image Net... In total, there are 351 classes in training, 97 in validation, and 160 in testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types) used for running the experiments. It only describes the network architectures. |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer, but does not provide version numbers for any software components (e.g., Python, PyTorch, TensorFlow, specific libraries). |
| Experiment Setup | Yes | Adam (Kingma & Ba, 2014) optimizer is used with α = 0.9, β = 0.999, ϵ = 10 8, and an initial learning rate of 0.001 that is decayed by half every 2000 episodes. On Omniglot, we train with 60 classes and 5 query points per episode. On mini Image Net and tiered Image Net, we train with 20 classes and 15 query points per episode. |