Homology Consistency Constrained Efficient Tuning for Vision-Language Models

Authors: Huatian Zhang, Lei Zhang, Yongdong Zhang, Zhendong Mao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on few-shot learning over 11 datasets and domain generalization demonstrate the effectiveness and robustness of our method.
Researcher Affiliation Academia Huatian Zhang, Lei Zhang, Yongdong Zhang, Zhendong Mao University of Science and Technology of China huatianzhang@mail.ustc.edu.cn, {leizh23,zhyd73,zdmao}@ustc.edu.cn
Pseudocode No The paper describes methodological steps verbally but does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes The code is publicly available 2https://github.com/htzhang-code/HC
Open Datasets Yes We conduct the few-shot learning evaluation on 11 benchmark datasets including Caltech101 [48], DTD [49], Euro SAT [50], FGVCAircraft [51], Flowers102 [52], Food101 [53], Image Net [54], Oxford Pets [55], Stanford Cars [56], SUN397 [57] and UCF101 [58].
Dataset Splits No We sample 1, 2, 4, 8 and 16 shots per class, respectively, for model training and evaluate on full test sets. The paper does not explicitly specify a validation set split or its use.
Hardware Specification Yes All experiments are conducted on a single NVIDIA A40 GPU.
Software Dependencies No The paper mentions optimizers like 'Adam optimizer' and 'Adam W optimizer' and frameworks like 'CLIP' and 'PyTorch' (implicitly by citing CLIP), but does not provide specific version numbers for critical software components or libraries.
Experiment Setup Yes The training batch size is 256. We employ the Adam optimizer with an initial learning rate of 1e 4 on Image Net and 1e 3 on others, and the learning rates decay with cosine learning rate schedule following Task Res. ... We set initial learning rate as 1e 3. All experiments are conducted on a single NVIDIA A40 GPU.