Bayesian Few-Shot Classification with One-vs-Each Pólya-Gamma Augmented Gaussian Processes

Authors: Jake Snell, Richard Zemel

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present our results on few-shot classification both in terms of accuracy and uncertainty quantification. Additional results comparing the one-vs-each composite likelihood to the softmax, logistic softmax, and Gaussian likelihoods may be found in Section F.
Researcher Affiliation Academia Jake C. Snell University of Toronto Vector Institute jsnell@cs.toronto.edu Richard Zemel University of Toronto Vector Institute Canadian Institute for Advanced Research zemel@cs.toronto.edu
Pseudocode Yes Algorithm 1 One-vs-Each P olya-Gamma GP Learning
Open Source Code Yes We have made Py Torch code for our experiments publicly available2. https://github.com/jakesnell/ove-polya-gamma-gp
Open Datasets Yes We used the four dataset scenarios described below... CUB. Caltech-UCSD Birds (CUB) (Wah et al., 2011)... mini-Imagenet. The mini-Imagenet dataset (Vinyals et al., 2016)... Omniglot (Lake et al., 2011)... EMNIST dataset (Cohen et al., 2017)
Dataset Splits Yes mini-Imagenet. The mini-Imagenet dataset (Vinyals et al., 2016) consists of 100 classes with 600 images per class. We used the split proposed by Ravi & Larochelle (2017), which has 64 classes for training, 16 for validation, and 20 for test.
Hardware Specification No The paper does not provide specific details regarding the hardware used for experiments, such as GPU or CPU models.
Software Dependencies No We have made Py Torch code for our experiments publicly available2. https://github.com/jakesnell/ove-polya-gamma-gp... For P olya Gamma sampling we use the Py P olya Gamma package3. https://github.com/slinderman/pypolyagamma
Experiment Setup Yes All methods employed the commonly-used Conv4 architecture (Vinyals et al., 2016) (see Table 4 for a detailed specification), except ABML which used 32 filters throughout. All of our experiments used the Adam (Kingma & Ba, 2015) optimizer with learning rate 10 3. During training, all models used epochs consisting of 100 randomly sampled episodes. A single gradient descent step on the encoder network and relevant hyperparameters is made per episode. All 1-shot models are trained for 600 epochs and 5-shot models are trained for 400 epochs... Each episode contained 5 classes (5-way) and 16 query examples.