Hard-Meta-Dataset++: Towards Understanding Few-Shot Performance on Difficult Tasks
Authors: Samyadeep Basu, Megan Stanley, John F Bronskill, Soheil Feizi, Daniela Massiceti
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We stress test an extensive suite of state-of-the-art few-shot classification methods on HARD-MD++, cross-validating the difficulty of our extracted tasks across these top-performing methods. In Fig 1, we show one such method (Hu et al., 2022) performing consistently worse on the META-DATASET test split in HARD-MD++ than on the original MD test split across all 10 MD sub-datasets. In Section 5, we find that this trend holds true across a wide-range of few-shot classification methods. |
| Researcher Affiliation | Collaboration | Samyadeep Basu , Megan Stanley, John Bronskill, Soheil Feizi, Daniela Massiceti {sbasu12, sfeizi}@umd.edu, {jfb54}@cam.ac.uk {meganstanley, dmassiceti}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 FASTDIFFSEL: Efficient algorithm for extracting a difficult few-shot task |
| Open Source Code | No | We will publically release the code and HARD-MD++ upon the acceptance of our manuscript. |
| Open Datasets | Yes | We leverage the scalability of FASTDIFFSEL to extract difficult tasks from a wide range of largescale vision datasets including META-DATASET (Triantafillou et al., 2019), OBJECTNET (Barbu et al., 2019), CURE-OR (Temel et al., 2018) and ORBIT (Massiceti et al., 2021) |
| Dataset Splits | Yes | All the sub-datasets (except mscoco and traffic signs) are split into disjoint train/val/test classes, whereas mscoco and traffic sign are test only sub-datasets. |
| Hardware Specification | Yes | We extract 50 5-way 5-shot tasks per sub-dataset using Hu et al. (2022) as the base model (Vi T-S initialized with SSL DINO weights then meta-trained with Proto Nets on MD s ilsvrc 2012) on an A5000 GPU (64GB RAM). We train this model using distributed training on 8 a6000 GPUs. |
| Software Dependencies | No | The paper mentions deep learning models and GPU usage (e.g., 'Vi T-S', 'Res Net', 'A5000 GPU') but does not specify versions for software dependencies like PyTorch, TensorFlow, or CUDA. |
| Experiment Setup | Yes | The fine-tuning algorithm from (Hu et al., 2022) has three hyperparameters: (i) learning rate; (ii) number of fine-tuning steps and (iii) probability of switching on data-augmentation for the support set. In our experiments, we set the number of fine-tuning steps as 50 and the probability of switching on data-augmentation for the support set as 0.9. We select the optimal learning rate from {0.0001, 0.001, 0.01} with the help of a separate validation set. We run the training for 50 epochs, where in each epoch 2000 episodes from ilsvrc 2012 are sampled. In total, we meta-train on 100k episodes variable-way, variable-shot episodes from ilsvrc 2012. We train using the SGD optimizer with a momentum of 0.9. We use a learning rate of 5e-4 with cosine scheduler for our experiments. |