Attention Temperature Matters in ViT-Based Cross-Domain Few-Shot Learning

Authors: Yixiong Zou, Ran Ma, Yuhua Li, Ruixuan Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on four CDFSL datasets validate the rationale of our interpretation and method, showing we can consistently outperform state-of-the-art methods.
Researcher Affiliation Academia School of Computer Science and Technology, Huazhong University of Science and Technology {yixiongz, ranma, idcliyuhua, rxli}@hust.edu.cn
Pseudocode No The paper describes methods in text and equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our codes are available at https://github.com/Zoilsen/Attn_Temp_CDFSL.
Open Datasets Yes Following current works [2, 12], we utilize the mini Image Net dataset [40] as the source dataset, and utilize 4 cross-domain datasets as the target datasets, including Crop Diseases [30], Euro SAT [13], ISIC2018 [5] and Chest X [44] for few-shot training and evaluation, using the k-way n-shot classification as stated in section 2.1.
Dataset Splits Yes During the learning and testing on DT , for the fair comparison, current works [2, 12] adopt a k-way n-shot paradigm to sample from DT to construct small datasets (i.e., episodes) consisting of k classes and n training samples in each class. Based on episodes, the model learns from these k n samples (a.k.a. support set, {xij, yi}k,n i=1,j=1) and is evaluated on testing samples from these k classes (a.a.k. query set, {xq}).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies No The paper mentions the use of the Adam optimizer but does not specify version numbers for any software dependencies or libraries such as Python, PyTorch, or CUDA.
Experiment Setup Yes We use the Adam [16] optimizer with a learning rate of 0.001 for the classifier and 10 6 for the backbone network. During the target-domain few-shot evaluation, we set the temperature for the first two blocks as 0.3, and set the attention of the CLS token to 0 for blocks whose ID is greater than 4.