On the Effectiveness of Parameter-Efficient Fine-Tuning
Authors: Zihao Fu, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, Nigel Collier
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments on several tasks. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis. |
| Researcher Affiliation | Collaboration | 1Language Technology Lab, University of Cambridge 2The Chinese University of Hong Kong 3DAMO Academy, Alibaba Group |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | The source code of this paper can be obtained from https://github.com/fuzihaofzh/Analyze Parameter Efficient Finetune |
| Open Datasets | Yes | We build our models with the jiant framework and test our models on several GLUE (Wang et al. 2018) and Super GLUE (Wang et al. 2019) tasks. ... we choose several tasks including Corpus of Linguistic Acceptability (Co LA) (Warstadt, Singh, and Bowman 2019), Semantic Textual Similarity Benchmark (STSB) (Cer et al. 2017), Microsoft Research Paraphrase Corpus (MRPC) (Dolan and Brockett 2005), Recognizing Textual Entailment (RTE) (Dagan, Glickman, and Magnini 2005; Bentivogli et al. 2009), Commitment Bank (CB) (De Marneffe, Simons, and Tonhauser 2019), Choice of Plausible Alternatives (COPA) (Roemmele, Bejan, and Gordon 2011), and Winograd Schema Challenge (WSC) (Levesque, Davis, and Morgenstern 2012). |
| Dataset Splits | Yes | Different from many previous works that train models without validation, we split the original training set by randomly sampling 10% as the new development set while using the remaining 90% samples to train the model. |
| Hardware Specification | Yes | We run the models on NVIDIA TITAN RTX GPU with 24GB memory. |
| Software Dependencies | No | The paper mentions 'jiant framework', 'Adapter Hub', 'loralib', and 'transformers toolkit' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | Instead of training the model for fixed epoch number, we use the new development set to do an early stop training by setting the tolerance for all models to 40. ... Following the setting of Guo, Rush, and Kim (2021), we set the sparsity to 0.005 for all models for a fair comparison. In SAM, we calculate L(θ0)i by accumulating the gradient for a few burn-in steps as we cannot load all the training data into memory, the burn-in steps are chosen from {500, 600, 700, 800, 900, 1000, 2000} on the development set as a hyper-parameter. |