Bayesian Few-Shot Classification with One-vs-Each Pólya-Gamma Augmented Gaussian Processes
Authors: Jake Snell, Richard Zemel
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present our results on few-shot classification both in terms of accuracy and uncertainty quantification. Additional results comparing the one-vs-each composite likelihood to the softmax, logistic softmax, and Gaussian likelihoods may be found in Section F. |
| Researcher Affiliation | Academia | Jake C. Snell University of Toronto Vector Institute jsnell@cs.toronto.edu Richard Zemel University of Toronto Vector Institute Canadian Institute for Advanced Research zemel@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1 One-vs-Each P olya-Gamma GP Learning |
| Open Source Code | Yes | We have made Py Torch code for our experiments publicly available2. https://github.com/jakesnell/ove-polya-gamma-gp |
| Open Datasets | Yes | We used the four dataset scenarios described below... CUB. Caltech-UCSD Birds (CUB) (Wah et al., 2011)... mini-Imagenet. The mini-Imagenet dataset (Vinyals et al., 2016)... Omniglot (Lake et al., 2011)... EMNIST dataset (Cohen et al., 2017) |
| Dataset Splits | Yes | mini-Imagenet. The mini-Imagenet dataset (Vinyals et al., 2016) consists of 100 classes with 600 images per class. We used the split proposed by Ravi & Larochelle (2017), which has 64 classes for training, 16 for validation, and 20 for test. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for experiments, such as GPU or CPU models. |
| Software Dependencies | No | We have made Py Torch code for our experiments publicly available2. https://github.com/jakesnell/ove-polya-gamma-gp... For P olya Gamma sampling we use the Py P olya Gamma package3. https://github.com/slinderman/pypolyagamma |
| Experiment Setup | Yes | All methods employed the commonly-used Conv4 architecture (Vinyals et al., 2016) (see Table 4 for a detailed specification), except ABML which used 32 filters throughout. All of our experiments used the Adam (Kingma & Ba, 2015) optimizer with learning rate 10 3. During training, all models used epochs consisting of 100 randomly sampled episodes. A single gradient descent step on the encoder network and relevant hyperparameters is made per episode. All 1-shot models are trained for 600 epochs and 5-shot models are trained for 400 epochs... Each episode contained 5 classes (5-way) and 16 query examples. |