LIVE: Learnable In-Context Vector for Visual Question Answering
Authors: Yingzhe Peng, chenduo hao, Xinting Hu, Jiawei Peng, Xin Geng, Xu Yang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that LIVE can significantly reduce computational costs while enhancing accuracy in VQA tasks compared to traditional ICL and other non-learnable ICV methods. |
| Researcher Affiliation | Academia | 1 Southeast University 2 Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China {yingzhe.peng, 213201447, pengjiawei, xgeng, xuyang_palm}@seu.edu.cn 3 Nanyang Technological University xinting001@e.ntu.edu.sg |
| Pseudocode | No | The paper describes the method mathematically and shows a training pipeline diagram (Figure 2), but no explicit pseudocode block or algorithm listing. |
| Open Source Code | Yes | The code is available at https: //github.com/For Jade Forest/LIVE-Learnable-In-Context-Vector. |
| Open Datasets | Yes | We evaluate our approach using the IDEFICS-9B model [9] across two datasets: VQAv2 [47] and OKVQA [48]. |
| Dataset Splits | Yes | For both VQAv2 and OKVQA datasets, We train our LIVE on 8, 000 pairs from each training set. Due to computational resource limitations, we randomly sample 10, 000 question-answer pairs from the VQAv2 validation split for evaluation [18]. For OKVQA, we utilize the entire validation split. |
| Hardware Specification | Yes | During the inference process, we utilize two Xeon Silver 3414 CPUs, one RTX 3090 GPU, and 384 GB of memory. |
| Software Dependencies | No | The paper mentions 'optimizer Adam W [52]' but does not specify version numbers for Adam W or other crucial software libraries such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Table 7: VQAv2 and OKVQA LIVE Training Parameters Hyperparameter VQAv2 OKVQA optimizer Adam W [52] Adam W learning rate of α 1e-2 1e-2 learning rate of V 1e-3 5e-3 λ 0.5 0.5 weight decay 1e-3 1e-3 precision FP16 FP16 batch size 2 2 warm up 0.1 0.1 accumulate batches 8 8 number of epochs 10 10 |