LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention

Authors: Renrui Zhang, Jiaming Han, Chris Liu, Aojun Zhou, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4.1, we first evaluate the language instruction-following capacity of LLa MA-Adapter. Then, we present our multi-modal reasoning performance on several benchmarks in Section 4.2, and conduct ablation studies on Science QA s validation set in Section 4.3. Finally, we report the fine-tuning results of our approach on traditional vision and language models in Section 4.4.
Researcher Affiliation Collaboration Renrui Zhang 1,2, Jiaming Han 1,2, Chris Liu 1, Aojun Zhou2, Pan Lu3 Yu Qiao 1, Hongsheng Li 2,4, Peng Gao 1 1Shanghai Artificial Intelligence Laboratory 2CUHK MMLab 3University of California, Los Angeles 4CPII of Inno HK {zhangrenrui, hanjiaming, gaopeng, qiaoyu}@pjlab.org.cn hsli@ee.cuhk.edu.hk
Pseudocode No The paper describes the mechanism using text and equations, but does not provide a formal pseudocode or algorithm block.
Open Source Code Yes Code and models are released at https://github.com/Open GVLab/LLa MA-Adapter.
Open Datasets Yes Following Stanford Alpaca (Taori et al., 2023), we utilize 52K instruction-following data for training. We fine-tune LLa MA-Adapter on 8 A100 GPUs for 5 epochs. ... Science QA (Lu et al., 2022) Evaluation. ... we utilize the raw image-caption data from LAION-400M (Schuhmann et al., 2021)... We select a pre-trained Vi T/16 (Dosovitskiy et al., 2020) as the vision model and evaluate on VTAB-1k (Zhai et al., 2019) benchmark...
Dataset Splits Yes In Section 4.1, we first evaluate the language instruction-following capacity of LLa MA-Adapter. Then, we present our multi-modal reasoning performance on several benchmarks in Section 4.2, and conduct ablation studies on Science QA s validation set in Section 4.3. ... Exact Match (EM) and F1 scores on the dev set are reported.
Hardware Specification Yes Thanks to our lightweight adaption modules with zero-initialized gating, the training convergence of LLa MA-Adapter costs less than one hour on 8 A100 GPUs, which are three times faster than Alpaca.
Software Dependencies No The paper mentions several models and datasets but does not provide specific version numbers for software dependencies or libraries used for implementation.
Experiment Setup Yes The warmup epochs, batch size, learning rate, and weight decay are set to 2, 64, 0.009, and 0.02, respectively. By default, we utilize the pre-trained LLa MA model with 7B parameters and N = 32 transformer layers. We adopt a prompt length K = 10 and insert the adaption prompts into the last L = 30 layers.