LLaMA-Adapter: Efficient Fine-tuning of Large Language Models with Zero-initialized Attention
Authors: Renrui Zhang, Jiaming Han, Chris Liu, Aojun Zhou, Pan Lu, Yu Qiao, Hongsheng Li, Peng Gao
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4.1, we first evaluate the language instruction-following capacity of LLa MA-Adapter. Then, we present our multi-modal reasoning performance on several benchmarks in Section 4.2, and conduct ablation studies on Science QA s validation set in Section 4.3. Finally, we report the fine-tuning results of our approach on traditional vision and language models in Section 4.4. |
| Researcher Affiliation | Collaboration | Renrui Zhang 1,2, Jiaming Han 1,2, Chris Liu 1, Aojun Zhou2, Pan Lu3 Yu Qiao 1, Hongsheng Li 2,4, Peng Gao 1 1Shanghai Artificial Intelligence Laboratory 2CUHK MMLab 3University of California, Los Angeles 4CPII of Inno HK {zhangrenrui, hanjiaming, gaopeng, qiaoyu}@pjlab.org.cn hsli@ee.cuhk.edu.hk |
| Pseudocode | No | The paper describes the mechanism using text and equations, but does not provide a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Code and models are released at https://github.com/Open GVLab/LLa MA-Adapter. |
| Open Datasets | Yes | Following Stanford Alpaca (Taori et al., 2023), we utilize 52K instruction-following data for training. We fine-tune LLa MA-Adapter on 8 A100 GPUs for 5 epochs. ... Science QA (Lu et al., 2022) Evaluation. ... we utilize the raw image-caption data from LAION-400M (Schuhmann et al., 2021)... We select a pre-trained Vi T/16 (Dosovitskiy et al., 2020) as the vision model and evaluate on VTAB-1k (Zhai et al., 2019) benchmark... |
| Dataset Splits | Yes | In Section 4.1, we first evaluate the language instruction-following capacity of LLa MA-Adapter. Then, we present our multi-modal reasoning performance on several benchmarks in Section 4.2, and conduct ablation studies on Science QA s validation set in Section 4.3. ... Exact Match (EM) and F1 scores on the dev set are reported. |
| Hardware Specification | Yes | Thanks to our lightweight adaption modules with zero-initialized gating, the training convergence of LLa MA-Adapter costs less than one hour on 8 A100 GPUs, which are three times faster than Alpaca. |
| Software Dependencies | No | The paper mentions several models and datasets but does not provide specific version numbers for software dependencies or libraries used for implementation. |
| Experiment Setup | Yes | The warmup epochs, batch size, learning rate, and weight decay are set to 2, 64, 0.009, and 0.02, respectively. By default, we utilize the pre-trained LLa MA model with 7B parameters and N = 32 transformer layers. We adopt a prompt length K = 10 and insert the adaption prompts into the last L = 30 layers. |