Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning
Authors: Kun Ding, Haojian Zhang, Qiang Yu, Ying Wang, Shiming Xiang, Chunhong Pan
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that even weak distribution detectors can still improve VLMs generalization ability. |
| Researcher Affiliation | Academia | 1State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences 2Engineering Laboratory for Intelligent Industrial Vision, Institute of Automation, Chinese Academy of Sciences 3Research Center of Aerospace Information, Institute of Automation, Chinese Academy of Sciences |
| Pseudocode | No | The paper describes the proposed method using text and mathematical equations, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states, 'With the public code of Co Op, Co Co Op, Pro Grad and Kg Co Op, we reproduce the results on the aforementioned datasets using the suggested parameters.' However, it does not provide concrete access or a link to the authors' own implementation code for the methodology described in this paper. |
| Open Datasets | Yes | Following prior works (Zhou et al. 2022b), for the base-to-novel experiment, we use 11 image classification datasets: Image Net (Deng et al. 2009) and Caltech101 (Li, Fergus, and Perona 2007) for generic object classification; Food101 (Bossard, Guillaumin, and Gool 2014), Stanford Cars (Krause et al. 2013), Oxford Pets (Parkhi et al. 2012) and Flowers102 (Nilsback and Zisserman 2008) and FGVCAircraft (Maji et al. 2013) for fine-grained visual classification; UCF101 (Soomro, Zamir, and Shah 2012) for action recognition; Euro SAT (Helber et al. 2019) for satellite image classification; DTD (Cimpoi et al. 2014) for texture classification; SUN397 (Xiao et al. 2010) for scene recognition. |
| Dataset Splits | Yes | In this setting, we split the train and test samples in 11 datasets into two groups: base classes (Base) and novel classes (Novel). The two sets do not share any identical classes. We train VLPT methods on base classes and evaluate them on novel classes. |
| Hardware Specification | Yes | All experiments are conducted on RTX A4000 GPU. |
| Software Dependencies | No | The paper mentions using models like CLIP, ALIGN, BLIP and reproducing results with the public code of Co Op, Co Co Op, Pro Grad, and Kg Co Op. However, it does not specify version numbers for general software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Implementation Details. With the public code of Co Op, Co Co Op, Pro Grad and Kg Co Op, we reproduce the results on the aforementioned datasets using the suggested parameters. For Co Op, the context length is 16 and random initialization is adopted. The batch size is 32 and 50 epochs are trained. For Co Co Op, the context is initialized by a photo of a , batch size is 1 and 10 epochs are trained. For Pro Grad, the training setting is identical to Co Op and the two extra hyper-parameters are set to 1.0. For Kg Co Op, the context is initialized by a photo of a and the additional hyperparameter λ is set to be 8.0. Due to limited GPU memory, the batch size is set to be 32. Besides, 100 epochs are trained. All methods use the same learning rate scheduler and optimizer. |