reproducibilityindex.ai

Attentional Meta-learners for Few-shot Polythetic Classification

Authors: Ben J Day, Ramon Viñas Torné, Nikola Simidjievski, Pietro Lió

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Throughout, we evidence these findings and the effectiveness of our proposals with experiments on synthetic and real-world few-shot learning tasks.
Researcher Affiliation	Academia	1Department of Computer Science and Technology, University of Cambridge, Cambridge, UK. Correspondence to: Ben Day <bjd39@cam.ac.uk.edu>, Ramon Vi nas <rv340@cam.ac.uk.edu>.
Pseudocode	Yes	Algorithm 1 Self-attention feature scoring. Scores can be used for rescaling or masking. Note that the z-normalisation is over the entire support set whilst the self-attention is within classes. The choice of dispersion measure is of secondary importance and discussed in the main text.
Open Source Code	Yes	The code is available at https://github.com/rvinas/polythetic_metalearning.
Open Datasets	Yes	For example, in the Omniglot task (Lake et al., 2011) we have access to a labelled set of handwritten characters during training and we are tasked with distinguishing new characters, from unseen writing systems, at test time. We build tasks (episodes) using MNIST digits (Le Cun et al., 2010), where an example consists of 4 coloured digits (RGB). Tiered Image Net (Ren et al., 2018) is a subset of ILSVRC-12 (Russakovsky et al., 2015)
Dataset Splits	Yes	For polythetic MNIST, the support set has 96 samples (2 classes, 2 groups per class, and 24 group-specific examples per group). The query set consists of 32 samples (2 classes, 2 groups per class, and 8 group-specific examples per group). In the multi-categorical pre-training, the model achieved a validation accuracy of 95.6% on the digit labels and 100% on the colour labels over 1600 validation examples.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using a convolutional neural network and an Adam optimizer, but does not provide specific version numbers for software libraries or dependencies.
Experiment Setup	Yes	We leverage a convolutional neural network with 4 blocks as a feature extractor. Each block consists of a convolutional layer (64 output channels and 3 × 3 filters), followed by batch normalisation (momentum 0.01), a Re LU activation, and 2 × 2 max pooling: Then, we flatten the output and apply a linear layer to map the data into a 64-dimensional embedding space (unless otherwise stated). We employ an Adam optimise with learning rate 0.001. We train the models for 10,000 iterations (i.e. tasks) for all experiments, except for full polythetic MNIST (100,000 tasks).