Neural Fine-Tuning Search for Few-Shot Learning

Authors: Panagiotis Eustratiadis, Łukasz Dudziak, Da Li, Timothy Hospedales

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the generality of our NAS method by applying it to both residual networks and vision transformers and report stateof-the-art performance on Meta-Dataset and Meta-Album.
Researcher Affiliation Collaboration Panagiotis Eustratiadis , Łukasz Dudziak , Da Li , Timothy Hospedales University of Edinburgh Samsung AI Center, Cambridge
Pseudocode Yes In Appendix C, we summarise the supernet training algorithm in pseudocode (Algorithm 1). ... Algorithm 1: Supernet training. ... Algorithm 2: Training time evolutionary search.
Open Source Code Yes The source code is available at: https://github.com/peustr/nfts-public.
Open Datasets Yes Evaluation on Meta-Dataset We evaluate NFTS on the extended version of Meta-Dataset (Requeima et al., 2019; Triantafillou et al., 2020), currently the most commonly used benchmark for few-shot classification, consisting of 13 publicly available datasets: FGVC Aircraft, CU Birds, Describable Textures (DTD), FGVCx Fungi, Image Net, Omniglot, Quick Draw, VGG Flowers, CIFAR10/100, MNIST, MSCOCO, and Traffic Signs. ... Evaluation on Meta-Album Further, we evaluate NFTS on the more recent Meta-Album (Ullah et al., 2022)
Dataset Splits Yes (a) After a supernet is trained, evolutionary search finds the top-performing candidates (validation set). ... This figure shows a 2D t-SNE projection of our 2K-dimensional architecture search space, where the dots are candidate architectures of the evolutionary search process at different iterations. The dots are colored according to their validation accuracy.
Hardware Specification No The paper mentions 'GPU-days' for cost but does not specify the type or model of GPU used, nor any other specific hardware components like CPUs or memory.
Software Dependencies No The paper does not list specific software dependencies with version numbers. It mentions backbones like 'Res Net-18' and 'Vi T-small' and adapters like 'TSA residual adapters' and 'Prefix Tuning', but these are models/architectures, not software dependencies with version numbers.
Experiment Setup Yes Table 12 reports the hyperparameters used for all of our experiments. Note the following clarifications: Number of epochs refers to multiple forward passes of the same episode, while Number of episodes refers to the number of episodes sampled in total. The batch size is not mentioned, because we only conduct episodic learning, where we do not split the episode into batches, i.e., we feed the entire support and query set into our neural network architectures. Learning rate warmup, where applicable, occurs for the first 10% of the episodes. We further specify something important: While our strongest competitors (Li et al., 2022; Xu et al., 2022) tune their learning rates for meta-testing (e.g., TSA uses LR=0.1 for seen domains and LR=1.0 for unseen, and ETT uses a different learning rate per downstream Meta-Dataset domain), we treat meta-testing episodes as completely unknown, and use the same hyperparameters we used on the validation set during search.