reproducibilityindex.ai

Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

Authors: Shibo Jie, Yehui Tang, Ning Ding, Zhi-Hong Deng, Kai Han, Yunhe Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across various VL tasks and language models reveal that Mem VP significantly reduces the training time and inference latency of the finetuned VL models and surpasses the performance of previous PEFT methods.
Researcher Affiliation	Collaboration	1School of Intelligence Science and Technology, Peking University 2Huawei Noah s Ark Lab 3National Key Laboratory of General Artificial Intelligence.
Pseudocode	No	The paper provides mathematical formulations and diagrams but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code: https: //github.com/Jie Shibo/Mem VP
Open Datasets	Yes	For visual question answering, we evaluate our method on VQAv2 (Goyal et al., 2017) and GQA (Hudson & Manning, 2019); for image captioning, we evaluate on COCO Captions (Chen et al., 2015). Additionally, we use a challenging VQA task, Science QA (Lu et al., 2022).
Dataset Splits	No	While the paper mentions reporting results on "validation sets" for TVQA and How2QA (Appendix B.1) and "test set" or "test-dev split" for VQAv2, GQA, COCO Captions, and Science QA, it does not provide specific details on how the training, validation, and test splits were performed (e.g., percentages or sample counts) for all datasets to allow full reproduction of data partitioning.
Hardware Specification	Yes	We show the inference speed across different lengths of input and output on LLa MA-7B on a single V100. Measured on V100 GPUs. Measured on 8 A800 GPUs
Software Dependencies	No	The paper does not specify the versions of software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	We train on each dataset for 20 epochs with batch size 8 64 and report performance on the test set. The hyperparameters of all methods are summarized in Appendix. Table 5. Hyperparameters on BART-base and T5-base. (Learning Rate, Batch Size, Epoch, Structure Hyper-Parameters) Table 7. Hyperparameters on LLa MA. (Learning Rate, Batch Size, Epoch, Structure Hyper-Parameters)