One Meta-tuned Transformer is What You Need for Few-shot Learning

Authors: Xu Yang, Huaxiu Yao, Ying Wei

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Meta Former demonstrates coherence and compatibility with off-the-shelf pre-trained vision transformers and shows significant improvements in both inductive and transductive few-shot learning scenarios, outperforming state-of-the-art methods by up to 8.77% and 6.25% on 12 in-domain and 10 cross-domain datasets, respectively.
Researcher Affiliation Academia 1City University of Hong Kong 2University of North Carolina at Chapel Hill 3Nanyang Technological University. Correspondence to: Xu Yang < xyang337-c@my.cityu.edu.hk>, Ying Wei <ying.wei@ntu.edu.sg>.
Pseudocode No The paper does not include any explicitly labeled pseudocode or algorithm blocks. Figures depict architectures and processes but are not in a code-like format.
Open Source Code No The paper does not contain any explicit statements about releasing source code, such as 'Our code is available at...' or provide a link to a code repository.
Open Datasets Yes We train and evaluate our Meta Former on the four standard few-shot benchmarks: mini Image Net (Vinyals et al., 2016b), tiered Image Net (Ren et al., 2018b), CIFARFS (Bertinetto et al., 2019) and FC-100 (Oreshkin et al., 2018).
Dataset Splits Yes In all experiments, we follow the standard data usage specifications same as Hiller et al. (2022), splitting data into the meta-training set, meta-validation set, and meta-test set, and classes in each set are mutually exclusive. [...] The classes are divided into 64, 16, and 20 for training, validation, and test, respectively.
Hardware Specification Yes The evaluation of inference latency is conducted on an NVIDIA RTX A6000 GPU.
Software Dependencies No The paper mentions using 'the SGD optimizer' but does not specify version numbers for programming languages, libraries (e.g., PyTorch, TensorFlow), or other software components necessary for reproduction.
Experiment Setup Yes We use the image resolution of 224 224 and the output is projected to 8192 dimensions. A patch size of 16 and window size of 7 are used... A batch size of 512 and a cosine-decaying learning rate schedule are used. [...] We employ the SGD optimizer, utilizing a cosine-decaying learning rate initiated at 2 10 4, a momentum value of 0.9, and a weight decay of 5 10 4 across all datasets. The input image size is set to 224 224 for Meta Former and 360 360 for SMKD-Meta Former. Typically, training is conducted for a maximum of 200 epochs.