Bridge the Modality and Capability Gaps in Vision-Language Model Selection

Authors: Chao Yi, Yuhang He, De-Chuan Zhan, Han-Jia Ye

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across various VLMs and image classification datasets validate SWAB s effectiveness. Code is available at: https://github.com/YCaigogogo/SWAB. 1 Introduction
Researcher Affiliation Academia Chao Yi, Yu-Hang He, De-Chuan Zhan, Han-Jia Ye B State Key Laboratory for Novel Software Technology, Nanjing University {yic,heyh,zhandc,yehj}@lamda.nju.edu.cn
Pseudocode Yes Algorithm 1 shows the pseudo-code of SWAB.
Open Source Code Yes Code is available at: https://github.com/YCaigogogo/SWAB.
Open Datasets Yes We evaluate different methods on 23 datasets, i.e. Image Net [8], Aircraft [36], CIFAR100 [26] and so on. ... Table 8: Detailed information of 23 tasks used in the LOVM Benchmark. This table comes from [73].
Dataset Splits No The paper describes a training phase for the ranker model on open-source datasets and testing on target datasets, but it does not explicitly provide details about a dedicated 'validation' split with percentages or sample counts for hyperparameter tuning or model selection.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for the experiments.
Software Dependencies No The paper mentions software tools and libraries like 'Chat GPT [43]', 'MPNet [49]', 'Optimal Transport [7, 45]' and 'Open CLIP library [21]', but it does not provide specific version numbers for any of these, which are necessary for full reproducibility.
Experiment Setup Yes For a fair comparison, SWAB follow Model GPT [73] to sequentially extract a target dataset from each of the 23 datasets in the LOVM Benchmark and treat the remaining datasets as open-source datasets. Besides, SWAB adopts Model GPT s approach of adding Gaussian noise to corrupt the target dataset s generated text embeddings. ... We conduct ten repeated experiments using random seeds from 1 to 10 and report the mean value and standard deviation of Model GPT s performance and SWAB s performance in Table 1. ... D.1 Filtering the Open-Source Tasks Classes: λ is a threshold and we set λ = 0.5. ... D.2 Using Partial Optimal Transport for Bridging the Capability Gap: We set mass = 0.9 in our implementation. ... D.3 Data Normalization in Bridging the Modality Gap: describes z-score normalization.