reproducibilityindex.ai

Attention Temperature Matters in ViT-Based Cross-Domain Few-Shot Learning

Authors: Yixiong Zou, Ran Ma, Yuhua Li, Ruixuan Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on four CDFSL datasets validate the rationale of our interpretation and method, showing we can consistently outperform state-of-the-art methods.
Researcher Affiliation	Academia	School of Computer Science and Technology, Huazhong University of Science and Technology {yixiongz, ranma, idcliyuhua, rxli}@hust.edu.cn
Pseudocode	No	The paper describes methods in text and equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our codes are available at https://github.com/Zoilsen/Attn_Temp_CDFSL.
Open Datasets	Yes	Following current works [2, 12], we utilize the mini Image Net dataset [40] as the source dataset, and utilize 4 cross-domain datasets as the target datasets, including Crop Diseases [30], Euro SAT [13], ISIC2018 [5] and Chest X [44] for few-shot training and evaluation, using the k-way n-shot classification as stated in section 2.1.
Dataset Splits	Yes	During the learning and testing on DT , for the fair comparison, current works [2, 12] adopt a k-way n-shot paradigm to sample from DT to construct small datasets (i.e., episodes) consisting of k classes and n training samples in each class. Based on episodes, the model learns from these k n samples (a.k.a. support set, {xij, yi}k,n i=1,j=1) and is evaluated on testing samples from these k classes (a.a.k. query set, {xq}).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments.
Software Dependencies	No	The paper mentions the use of the Adam optimizer but does not specify version numbers for any software dependencies or libraries such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	We use the Adam [16] optimizer with a learning rate of 0.001 for the classifier and 10 6 for the backbone network. During the target-domain few-shot evaluation, we set the temperature for the first two blocks as 0.3, and set the attention of the CLS token to 0 for blocks whose ID is greater than 4.