Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bridge the Modality and Capability Gaps in Vision-Language Model Selection
Authors: Chao Yi, Yuhang He, De-Chuan Zhan, Han-Jia Ye
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across various VLMs and image classification datasets validate SWAB s effectiveness. Code is available at: https://github.com/YCaigogogo/SWAB. 1 Introduction |
| Researcher Affiliation | Academia | Chao Yi, Yu-Hang He, De-Chuan Zhan, Han-Jia Ye B State Key Laboratory for Novel Software Technology, Nanjing University EMAIL |
| Pseudocode | Yes | Algorithm 1 shows the pseudo-code of SWAB. |
| Open Source Code | Yes | Code is available at: https://github.com/YCaigogogo/SWAB. |
| Open Datasets | Yes | We evaluate different methods on 23 datasets, i.e. Image Net [8], Aircraft [36], CIFAR100 [26] and so on. ... Table 8: Detailed information of 23 tasks used in the LOVM Benchmark. This table comes from [73]. |
| Dataset Splits | No | The paper describes a training phase for the ranker model on open-source datasets and testing on target datasets, but it does not explicitly provide details about a dedicated 'validation' split with percentages or sample counts for hyperparameter tuning or model selection. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, memory) used for the experiments. |
| Software Dependencies | No | The paper mentions software tools and libraries like 'Chat GPT [43]', 'MPNet [49]', 'Optimal Transport [7, 45]' and 'Open CLIP library [21]', but it does not provide specific version numbers for any of these, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For a fair comparison, SWAB follow Model GPT [73] to sequentially extract a target dataset from each of the 23 datasets in the LOVM Benchmark and treat the remaining datasets as open-source datasets. Besides, SWAB adopts Model GPT s approach of adding Gaussian noise to corrupt the target dataset s generated text embeddings. ... We conduct ten repeated experiments using random seeds from 1 to 10 and report the mean value and standard deviation of Model GPT s performance and SWAB s performance in Table 1. ... D.1 Filtering the Open-Source Tasks Classes: λ is a threshold and we set λ = 0.5. ... D.2 Using Partial Optimal Transport for Bridging the Capability Gap: We set mass = 0.9 in our implementation. ... D.3 Data Normalization in Bridging the Modality Gap: describes z-score normalization. |