reproducibilityindex.ai

Bridge the Modality and Capability Gaps in Vision-Language Model Selection

Authors: Chao Yi, Yuhang He, De-Chuan Zhan, Han-Jia Ye

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across various VLMs and image classification datasets validate SWAB s effectiveness. Code is available at: https://github.com/YCaigogogo/SWAB. 1 Introduction
Researcher Affiliation	Academia	Chao Yi, Yu-Hang He, De-Chuan Zhan, Han-Jia Ye B State Key Laboratory for Novel Software Technology, Nanjing University {yic,heyh,zhandc,yehj}@lamda.nju.edu.cn
Pseudocode	Yes	Algorithm 1 shows the pseudo-code of SWAB.
Open Source Code	Yes	Code is available at: https://github.com/YCaigogogo/SWAB.
Open Datasets	Yes	We evaluate different methods on 23 datasets, i.e. Image Net [8], Aircraft [36], CIFAR100 [26] and so on. ... Table 8: Detailed information of 23 tasks used in the LOVM Benchmark. This table comes from [73].
Dataset Splits	No	The paper describes a training phase for the ranker model on open-source datasets and testing on target datasets, but it does not explicitly provide details about a dedicated 'validation' split with percentages or sample counts for hyperparameter tuning or model selection.
Hardware Specification	No	The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for the experiments.
Software Dependencies	No	The paper mentions software tools and libraries like 'Chat GPT [43]', 'MPNet [49]', 'Optimal Transport [7, 45]' and 'Open CLIP library [21]', but it does not provide specific version numbers for any of these, which are necessary for full reproducibility.
Experiment Setup	Yes	For a fair comparison, SWAB follow Model GPT [73] to sequentially extract a target dataset from each of the 23 datasets in the LOVM Benchmark and treat the remaining datasets as open-source datasets. Besides, SWAB adopts Model GPT s approach of adding Gaussian noise to corrupt the target dataset s generated text embeddings. ... We conduct ten repeated experiments using random seeds from 1 to 10 and report the mean value and standard deviation of Model GPT s performance and SWAB s performance in Table 1. ... D.1 Filtering the Open-Source Tasks Classes: λ is a threshold and we set λ = 0.5. ... D.2 Using Partial Optimal Transport for Bridging the Capability Gap: We set mass = 0.9 in our implementation. ... D.3 Data Normalization in Bridging the Modality Gap: describes z-score normalization.