Bridge the Modality and Capability Gaps in Vision-Language Model Selection
Authors: Chao Yi, Yuhang He, De-Chuan Zhan, Han-Jia Ye
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across various VLMs and image classification datasets validate SWAB s effectiveness. Code is available at: https://github.com/YCaigogogo/SWAB. 1 Introduction |
| Researcher Affiliation | Academia | Chao Yi, Yu-Hang He, De-Chuan Zhan, Han-Jia Ye B State Key Laboratory for Novel Software Technology, Nanjing University {yic,heyh,zhandc,yehj}@lamda.nju.edu.cn |
| Pseudocode | Yes | Algorithm 1 shows the pseudo-code of SWAB. |
| Open Source Code | Yes | Code is available at: https://github.com/YCaigogogo/SWAB. |
| Open Datasets | Yes | We evaluate different methods on 23 datasets, i.e. Image Net [8], Aircraft [36], CIFAR100 [26] and so on. ... Table 8: Detailed information of 23 tasks used in the LOVM Benchmark. This table comes from [73]. |
| Dataset Splits | No | The paper describes a training phase for the ranker model on open-source datasets and testing on target datasets, but it does not explicitly provide details about a dedicated 'validation' split with percentages or sample counts for hyperparameter tuning or model selection. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions software tools and libraries like 'Chat GPT [43]', 'MPNet [49]', 'Optimal Transport [7, 45]' and 'Open CLIP library [21]', but it does not provide specific version numbers for any of these, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For a fair comparison, SWAB follow Model GPT [73] to sequentially extract a target dataset from each of the 23 datasets in the LOVM Benchmark and treat the remaining datasets as open-source datasets. Besides, SWAB adopts Model GPT s approach of adding Gaussian noise to corrupt the target dataset s generated text embeddings. ... We conduct ten repeated experiments using random seeds from 1 to 10 and report the mean value and standard deviation of Model GPT s performance and SWAB s performance in Table 1. ... D.1 Filtering the Open-Source Tasks Classes: λ is a threshold and we set λ = 0.5. ... D.2 Using Partial Optimal Transport for Bridging the Capability Gap: We set mass = 0.9 in our implementation. ... D.3 Data Normalization in Bridging the Modality Gap: describes z-score normalization. |