Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

Authors: Kun Ding, Haojian Zhang, Qiang Yu, Ying Wang, Shiming Xiang, Chunhong Pan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that even weak distribution detectors can still improve VLMs generalization ability.
Researcher Affiliation Academia 1State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences 2Engineering Laboratory for Intelligent Industrial Vision, Institute of Automation, Chinese Academy of Sciences 3Research Center of Aerospace Information, Institute of Automation, Chinese Academy of Sciences
Pseudocode No The paper describes the proposed method using text and mathematical equations, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper states, 'With the public code of Co Op, Co Co Op, Pro Grad and Kg Co Op, we reproduce the results on the aforementioned datasets using the suggested parameters.' However, it does not provide concrete access or a link to the authors' own implementation code for the methodology described in this paper.
Open Datasets Yes Following prior works (Zhou et al. 2022b), for the base-to-novel experiment, we use 11 image classification datasets: Image Net (Deng et al. 2009) and Caltech101 (Li, Fergus, and Perona 2007) for generic object classification; Food101 (Bossard, Guillaumin, and Gool 2014), Stanford Cars (Krause et al. 2013), Oxford Pets (Parkhi et al. 2012) and Flowers102 (Nilsback and Zisserman 2008) and FGVCAircraft (Maji et al. 2013) for fine-grained visual classification; UCF101 (Soomro, Zamir, and Shah 2012) for action recognition; Euro SAT (Helber et al. 2019) for satellite image classification; DTD (Cimpoi et al. 2014) for texture classification; SUN397 (Xiao et al. 2010) for scene recognition.
Dataset Splits Yes In this setting, we split the train and test samples in 11 datasets into two groups: base classes (Base) and novel classes (Novel). The two sets do not share any identical classes. We train VLPT methods on base classes and evaluate them on novel classes.
Hardware Specification Yes All experiments are conducted on RTX A4000 GPU.
Software Dependencies No The paper mentions using models like CLIP, ALIGN, BLIP and reproducing results with the public code of Co Op, Co Co Op, Pro Grad, and Kg Co Op. However, it does not specify version numbers for general software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup Yes Implementation Details. With the public code of Co Op, Co Co Op, Pro Grad and Kg Co Op, we reproduce the results on the aforementioned datasets using the suggested parameters. For Co Op, the context length is 16 and random initialization is adopted. The batch size is 32 and 50 epochs are trained. For Co Co Op, the context is initialized by a photo of a , batch size is 1 and 10 epochs are trained. For Pro Grad, the training setting is identical to Co Op and the two extra hyper-parameters are set to 1.0. For Kg Co Op, the context is initialized by a photo of a and the additional hyperparameter λ is set to be 8.0. Due to limited GPU memory, the batch size is set to be 32. Besides, 100 epochs are trained. All methods use the same learning rate scheduler and optimizer.