One Meta-tuned Transformer is What You Need for Few-shot Learning
Authors: Xu Yang, Huaxiu Yao, Ying Wei
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Meta Former demonstrates coherence and compatibility with off-the-shelf pre-trained vision transformers and shows significant improvements in both inductive and transductive few-shot learning scenarios, outperforming state-of-the-art methods by up to 8.77% and 6.25% on 12 in-domain and 10 cross-domain datasets, respectively. |
| Researcher Affiliation | Academia | 1City University of Hong Kong 2University of North Carolina at Chapel Hill 3Nanyang Technological University. Correspondence to: Xu Yang < xyang337-c@my.cityu.edu.hk>, Ying Wei <ying.wei@ntu.edu.sg>. |
| Pseudocode | No | The paper does not include any explicitly labeled pseudocode or algorithm blocks. Figures depict architectures and processes but are not in a code-like format. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code, such as 'Our code is available at...' or provide a link to a code repository. |
| Open Datasets | Yes | We train and evaluate our Meta Former on the four standard few-shot benchmarks: mini Image Net (Vinyals et al., 2016b), tiered Image Net (Ren et al., 2018b), CIFARFS (Bertinetto et al., 2019) and FC-100 (Oreshkin et al., 2018). |
| Dataset Splits | Yes | In all experiments, we follow the standard data usage specifications same as Hiller et al. (2022), splitting data into the meta-training set, meta-validation set, and meta-test set, and classes in each set are mutually exclusive. [...] The classes are divided into 64, 16, and 20 for training, validation, and test, respectively. |
| Hardware Specification | Yes | The evaluation of inference latency is conducted on an NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions using 'the SGD optimizer' but does not specify version numbers for programming languages, libraries (e.g., PyTorch, TensorFlow), or other software components necessary for reproduction. |
| Experiment Setup | Yes | We use the image resolution of 224 224 and the output is projected to 8192 dimensions. A patch size of 16 and window size of 7 are used... A batch size of 512 and a cosine-decaying learning rate schedule are used. [...] We employ the SGD optimizer, utilizing a cosine-decaying learning rate initiated at 2 10 4, a momentum value of 0.9, and a weight decay of 5 10 4 across all datasets. The input image size is set to 224 224 for Meta Former and 360 360 for SMKD-Meta Former. Typically, training is conducted for a maximum of 200 epochs. |