Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
Authors: Yuanchen Wu, Junlong Du, Ke Yan, Shouhong Ding, Xiaoqiang Li
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results across various VL tasks demonstrate that the proposed To VE achieves competitive performance with two orders of magnitude fewer training data. |
| Researcher Affiliation | Collaboration | 1School of Computer Engineering and Science, Shanghai University, Shanghai 2Tencent Youtu Lab, Shanghai EMAIL, EMAIL |
| Pseudocode | No | The paper describes its methodology in natural language and mathematical formulas, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The pre-training dataset is composed of two in-domain datasets (i.e., COCO (Lin et al., 2014) and Visual Genome (Krishna et al., 2017)) and one web dataset (i.e., CC3M (Sharma et al., 2018)). |
| Dataset Splits | Yes | Fine-tuned caption performance on COCO (Karpathy split) and No Caps (validation set). Fine-tuned VQA performance on VQA v2 (test set). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models or CPU specifications. |
| Software Dependencies | No | The paper mentions using the Adam W optimizer, but it does not specify version numbers for any software libraries or frameworks like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | All our models are trained using the Adam W optimizer with a weight decay of 0.05. Automated data augmentation (Auto Aug) is applied during both the pre-training and fine-tuning stages. For pre-training, the learning rate is set to 3e-4 with a total of 10 epochs. During fine-tuning for VQA, we use a learning rate of 1e-5 and train for 10 epochs. For fine-tuning the captioning model, the learning rate is 1e-5 with a total of 3 epochs. |