Attentional Meta-learners for Few-shot Polythetic Classification
Authors: Ben J Day, Ramon Viñas Torné, Nikola Simidjievski, Pietro Lió
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Throughout, we evidence these findings and the effectiveness of our proposals with experiments on synthetic and real-world few-shot learning tasks. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Technology, University of Cambridge, Cambridge, UK. Correspondence to: Ben Day <bjd39@cam.ac.uk.edu>, Ramon Vi nas <rv340@cam.ac.uk.edu>. |
| Pseudocode | Yes | Algorithm 1 Self-attention feature scoring. Scores can be used for rescaling or masking. Note that the z-normalisation is over the entire support set whilst the self-attention is within classes. The choice of dispersion measure is of secondary importance and discussed in the main text. |
| Open Source Code | Yes | The code is available at https://github.com/rvinas/polythetic_metalearning. |
| Open Datasets | Yes | For example, in the Omniglot task (Lake et al., 2011) we have access to a labelled set of handwritten characters during training and we are tasked with distinguishing new characters, from unseen writing systems, at test time. We build tasks (episodes) using MNIST digits (Le Cun et al., 2010), where an example consists of 4 coloured digits (RGB). Tiered Image Net (Ren et al., 2018) is a subset of ILSVRC-12 (Russakovsky et al., 2015) |
| Dataset Splits | Yes | For polythetic MNIST, the support set has 96 samples (2 classes, 2 groups per class, and 24 group-specific examples per group). The query set consists of 32 samples (2 classes, 2 groups per class, and 8 group-specific examples per group). In the multi-categorical pre-training, the model achieved a validation accuracy of 95.6% on the digit labels and 100% on the colour labels over 1600 validation examples. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using a convolutional neural network and an Adam optimizer, but does not provide specific version numbers for software libraries or dependencies. |
| Experiment Setup | Yes | We leverage a convolutional neural network with 4 blocks as a feature extractor. Each block consists of a convolutional layer (64 output channels and 3 × 3 filters), followed by batch normalisation (momentum 0.01), a Re LU activation, and 2 × 2 max pooling: Then, we flatten the output and apply a linear layer to map the data into a 64-dimensional embedding space (unless otherwise stated). We employ an Adam optimise with learning rate 0.001. We train the models for 10,000 iterations (i.e. tasks) for all experiments, except for full polythetic MNIST (100,000 tasks). |