A Closer Look at the CLS Token for Cross-Domain Few-Shot Learning

Authors: Yixiong Zou, Shuai Yi, Yuhua Li, Ruixuan Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four benchmarks validate our rationale and state-of-the-art performance.
Researcher Affiliation Academia Yixiong Zou1 Shuai Yi2 Yuhua Li1 Ruixuan Li1 1School of Computer Science and Technology, 2School of Artificial Intelligence and Automation Huazhong University of Science and Technology
Pseudocode No Not found. The paper describes the method narratively but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our codes are available at https://github.com/Zoilsen/CLS_Token_CDFSL.
Open Datasets Yes Following current works [28, 33], we utilize the mini Image Net dataset [34] as our source domain with around 60k images and 100 annotated classes. We train our models on the training split of the source dataset, then finetune and evaluate the generalization performance on four target-domain datasets, Crop Disease [25], Euro SAT [16], ISIC [5], and Chest X [38], which are cross-domain datasets from the domain of agriculture, remote sensing and medical data (with significant domain gaps).
Dataset Splits Yes During the learning on each target dataset, the model will be applied to learn from the sampled support set {x T ij, y T ij}K,M i=1,j=1 which is called the K-way M-shot task (i.e., K classes in each support set with M samples in each class). Finally, the model will be evaluated on the query set {x T q }.
Hardware Specification Yes Experiments are conducted on NVIDIA A5000 GPUs.
Software Dependencies No Not found. The paper does not specify version numbers for ancillary software dependencies.
Experiment Setup Yes In implementation, we set the domain number to 64, i.e., each source-domain class has a specific domain token. We set λ to 100 to keep two losses on the same scale. We follow [1, 13, 42] to take DINO on Image Net as the pretraining of our backbone network, and scale the learning rate of the domain token to 1% of the backbone network.