Attention Temperature Matters in ViT-Based Cross-Domain Few-Shot Learning
Authors: Yixiong Zou, Ran Ma, Yuhua Li, Ruixuan Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on four CDFSL datasets validate the rationale of our interpretation and method, showing we can consistently outperform state-of-the-art methods. |
| Researcher Affiliation | Academia | School of Computer Science and Technology, Huazhong University of Science and Technology {yixiongz, ranma, idcliyuhua, rxli}@hust.edu.cn |
| Pseudocode | No | The paper describes methods in text and equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our codes are available at https://github.com/Zoilsen/Attn_Temp_CDFSL. |
| Open Datasets | Yes | Following current works [2, 12], we utilize the mini Image Net dataset [40] as the source dataset, and utilize 4 cross-domain datasets as the target datasets, including Crop Diseases [30], Euro SAT [13], ISIC2018 [5] and Chest X [44] for few-shot training and evaluation, using the k-way n-shot classification as stated in section 2.1. |
| Dataset Splits | Yes | During the learning and testing on DT , for the fair comparison, current works [2, 12] adopt a k-way n-shot paradigm to sample from DT to construct small datasets (i.e., episodes) consisting of k classes and n training samples in each class. Based on episodes, the model learns from these k n samples (a.k.a. support set, {xij, yi}k,n i=1,j=1) and is evaluated on testing samples from these k classes (a.a.k. query set, {xq}). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of the Adam optimizer but does not specify version numbers for any software dependencies or libraries such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We use the Adam [16] optimizer with a learning rate of 0.001 for the classifier and 10 6 for the backbone network. During the target-domain few-shot evaluation, we set the temperature for the first two blocks as 0.3, and set the attention of the CLS token to 0 for blocks whose ID is greater than 4. |