Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
Authors: Shibo Jie, Yehui Tang, Ning Ding, Zhi-Hong Deng, Kai Han, Yunhe Wang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results across various VL tasks and language models reveal that Mem VP significantly reduces the training time and inference latency of the finetuned VL models and surpasses the performance of previous PEFT methods. |
| Researcher Affiliation | Collaboration | 1School of Intelligence Science and Technology, Peking University 2Huawei Noah s Ark Lab 3National Key Laboratory of General Artificial Intelligence. |
| Pseudocode | No | The paper provides mathematical formulations and diagrams but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code: https: //github.com/Jie Shibo/Mem VP |
| Open Datasets | Yes | For visual question answering, we evaluate our method on VQAv2 (Goyal et al., 2017) and GQA (Hudson & Manning, 2019); for image captioning, we evaluate on COCO Captions (Chen et al., 2015). Additionally, we use a challenging VQA task, Science QA (Lu et al., 2022). |
| Dataset Splits | No | While the paper mentions reporting results on "validation sets" for TVQA and How2QA (Appendix B.1) and "test set" or "test-dev split" for VQAv2, GQA, COCO Captions, and Science QA, it does not provide specific details on how the training, validation, and test splits were performed (e.g., percentages or sample counts) for all datasets to allow full reproduction of data partitioning. |
| Hardware Specification | Yes | We show the inference speed across different lengths of input and output on LLa MA-7B on a single V100. Measured on V100 GPUs. Measured on 8 A800 GPUs |
| Software Dependencies | No | The paper does not specify the versions of software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train on each dataset for 20 epochs with batch size 8 64 and report performance on the test set. The hyperparameters of all methods are summarized in Appendix. Table 5. Hyperparameters on BART-base and T5-base. (Learning Rate, Batch Size, Epoch, Structure Hyper-Parameters) Table 7. Hyperparameters on LLa MA. (Learning Rate, Batch Size, Epoch, Structure Hyper-Parameters) |