reproducibilityindex.ai

Weak Distribution Detectors Lead to Stronger Generalizability of Vision-Language Prompt Tuning

Authors: Kun Ding, Haojian Zhang, Qiang Yu, Ying Wang, Shiming Xiang, Chunhong Pan

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that even weak distribution detectors can still improve VLMs generalization ability.
Researcher Affiliation	Academia	1State Key Laboratory of Multimodal Artiﬁcial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences 2Engineering Laboratory for Intelligent Industrial Vision, Institute of Automation, Chinese Academy of Sciences 3Research Center of Aerospace Information, Institute of Automation, Chinese Academy of Sciences
Pseudocode	No	The paper describes the proposed method using text and mathematical equations, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper states, 'With the public code of Co Op, Co Co Op, Pro Grad and Kg Co Op, we reproduce the results on the aforementioned datasets using the suggested parameters.' However, it does not provide concrete access or a link to the authors' own implementation code for the methodology described in this paper.
Open Datasets	Yes	Following prior works (Zhou et al. 2022b), for the base-to-novel experiment, we use 11 image classiﬁcation datasets: Image Net (Deng et al. 2009) and Caltech101 (Li, Fergus, and Perona 2007) for generic object classiﬁcation; Food101 (Bossard, Guillaumin, and Gool 2014), Stanford Cars (Krause et al. 2013), Oxford Pets (Parkhi et al. 2012) and Flowers102 (Nilsback and Zisserman 2008) and FGVCAircraft (Maji et al. 2013) for ﬁne-grained visual classiﬁcation; UCF101 (Soomro, Zamir, and Shah 2012) for action recognition; Euro SAT (Helber et al. 2019) for satellite image classiﬁcation; DTD (Cimpoi et al. 2014) for texture classiﬁcation; SUN397 (Xiao et al. 2010) for scene recognition.
Dataset Splits	Yes	In this setting, we split the train and test samples in 11 datasets into two groups: base classes (Base) and novel classes (Novel). The two sets do not share any identical classes. We train VLPT methods on base classes and evaluate them on novel classes.
Hardware Specification	Yes	All experiments are conducted on RTX A4000 GPU.
Software Dependencies	No	The paper mentions using models like CLIP, ALIGN, BLIP and reproducing results with the public code of Co Op, Co Co Op, Pro Grad, and Kg Co Op. However, it does not specify version numbers for general software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	Implementation Details. With the public code of Co Op, Co Co Op, Pro Grad and Kg Co Op, we reproduce the results on the aforementioned datasets using the suggested parameters. For Co Op, the context length is 16 and random initialization is adopted. The batch size is 32 and 50 epochs are trained. For Co Co Op, the context is initialized by a photo of a , batch size is 1 and 10 epochs are trained. For Pro Grad, the training setting is identical to Co Op and the two extra hyper-parameters are set to 1.0. For Kg Co Op, the context is initialized by a photo of a and the additional hyperparameter λ is set to be 8.0. Due to limited GPU memory, the batch size is set to be 32. Besides, 100 epochs are trained. All methods use the same learning rate scheduler and optimizer.