Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Hard-Meta-Dataset++: Towards Understanding Few-Shot Performance on Difficult Tasks

Authors: Samyadeep Basu, Megan Stanley, John F Bronskill, Soheil Feizi, Daniela Massiceti

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We stress test an extensive suite of state-of-the-art few-shot classification methods on HARD-MD++, cross-validating the difficulty of our extracted tasks across these top-performing methods. In Fig 1, we show one such method (Hu et al., 2022) performing consistently worse on the META-DATASET test split in HARD-MD++ than on the original MD test split across all 10 MD sub-datasets. In Section 5, we find that this trend holds true across a wide-range of few-shot classification methods.
Researcher Affiliation Collaboration Samyadeep Basu , Megan Stanley, John Bronskill, Soheil Feizi, Daniela Massiceti EMAIL, {jfb54}@cam.ac.uk EMAIL
Pseudocode Yes Algorithm 1 FASTDIFFSEL: Efficient algorithm for extracting a difficult few-shot task
Open Source Code No We will publically release the code and HARD-MD++ upon the acceptance of our manuscript.
Open Datasets Yes We leverage the scalability of FASTDIFFSEL to extract difficult tasks from a wide range of largescale vision datasets including META-DATASET (Triantafillou et al., 2019), OBJECTNET (Barbu et al., 2019), CURE-OR (Temel et al., 2018) and ORBIT (Massiceti et al., 2021)
Dataset Splits Yes All the sub-datasets (except mscoco and traffic signs) are split into disjoint train/val/test classes, whereas mscoco and traffic sign are test only sub-datasets.
Hardware Specification Yes We extract 50 5-way 5-shot tasks per sub-dataset using Hu et al. (2022) as the base model (Vi T-S initialized with SSL DINO weights then meta-trained with Proto Nets on MD s ilsvrc 2012) on an A5000 GPU (64GB RAM). We train this model using distributed training on 8 a6000 GPUs.
Software Dependencies No The paper mentions deep learning models and GPU usage (e.g., 'Vi T-S', 'Res Net', 'A5000 GPU') but does not specify versions for software dependencies like PyTorch, TensorFlow, or CUDA.
Experiment Setup Yes The fine-tuning algorithm from (Hu et al., 2022) has three hyperparameters: (i) learning rate; (ii) number of fine-tuning steps and (iii) probability of switching on data-augmentation for the support set. In our experiments, we set the number of fine-tuning steps as 50 and the probability of switching on data-augmentation for the support set as 0.9. We select the optimal learning rate from {0.0001, 0.001, 0.01} with the help of a separate validation set. We run the training for 50 epochs, where in each epoch 2000 episodes from ilsvrc 2012 are sampled. In total, we meta-train on 100k episodes variable-way, variable-shot episodes from ilsvrc 2012. We train using the SGD optimizer with a momentum of 0.9. We use a learning rate of 5e-4 with cosine scheduler for our experiments.