p-Laplacian Adaptation for Generative Pre-trained Vision-Language Models
Authors: Haoyuan Wu, Xinyun Zhang, Peng Xu, Peiyu Liao, Xufeng Yao, Bei Yu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on different pretrained VLMs and multi-modal tasks, including visual question answering, visual entailment, and image captioning. The experimental results validate our method s significant superiority over other PETL methods. |
| Researcher Affiliation | Academia | Department of Computer Science & Engineering, The Chinese University of Hong Kong wuhyhowell@gmail.com, xyzhang21@cse.cuhk.edu.hk, byu@cse.cuhk.edu.hk |
| Pseudocode | No | The paper describes the proposed method using mathematical equations and diagrams, but it does not include pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/wuhy68/p-Adapter/ |
| Open Datasets | Yes | We test our model on VQA2.0 (Goyal et al. 2017) ... SNLI-VE (Xie et al. 2019) ... COCO Captions (Lin et al. 2014) ... Text Caps (Sidorov et al. 2020), and Viz Wiz Caps (Gurari et al. 2020). |
| Dataset Splits | Yes | We test our model on VQA2.0 (Goyal et al. 2017) with the widely-used Karpathy split (Karpathy and Fei-Fei 2015) and Viz Wiz VQA (Gurari et al. 2018). ... COCO Captions (Lin et al. 2014) with Karpathy split (Karpathy and Fei-Fei 2015) |
| Hardware Specification | Yes | Our experiments are implemented in Py Torch (Paszke et al. 2019) and conducted on 8 Nvidia 3090 GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch (Paszke et al. 2019)" but does not specify a version number for PyTorch or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We use Adam W (Loshchilov and Hutter 2017) optimizer with a weight decay of 0.05 and apply a linear scheduler. We take random image crops of resolution 224 224 as the input of the encoder, and also apply Rand Augment (Cubuk et al. 2020) during the training... We train the model for five and two epochs for VQA and VE, and image captioning, respectively. We sweep a wide range of learning rates over {1 10 4, 2 10 4, 5 10 4, 1 10 3} for PETL methods, and use 2 10 5 for full fine-tuning |