On the Effectiveness of Parameter-Efficient Fine-Tuning

Authors: Zihao Fu, Haoran Yang, Anthony Man-Cho So, Wai Lam, Lidong Bing, Nigel Collier

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on several tasks. The experimental results show that our proposed SAM model outperforms many strong baseline models and it also verifies our theoretical analysis.
Researcher Affiliation Collaboration 1Language Technology Lab, University of Cambridge 2The Chinese University of Hong Kong 3DAMO Academy, Alibaba Group
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code Yes The source code of this paper can be obtained from https://github.com/fuzihaofzh/Analyze Parameter Efficient Finetune
Open Datasets Yes We build our models with the jiant framework and test our models on several GLUE (Wang et al. 2018) and Super GLUE (Wang et al. 2019) tasks. ... we choose several tasks including Corpus of Linguistic Acceptability (Co LA) (Warstadt, Singh, and Bowman 2019), Semantic Textual Similarity Benchmark (STSB) (Cer et al. 2017), Microsoft Research Paraphrase Corpus (MRPC) (Dolan and Brockett 2005), Recognizing Textual Entailment (RTE) (Dagan, Glickman, and Magnini 2005; Bentivogli et al. 2009), Commitment Bank (CB) (De Marneffe, Simons, and Tonhauser 2019), Choice of Plausible Alternatives (COPA) (Roemmele, Bejan, and Gordon 2011), and Winograd Schema Challenge (WSC) (Levesque, Davis, and Morgenstern 2012).
Dataset Splits Yes Different from many previous works that train models without validation, we split the original training set by randomly sampling 10% as the new development set while using the remaining 90% samples to train the model.
Hardware Specification Yes We run the models on NVIDIA TITAN RTX GPU with 24GB memory.
Software Dependencies No The paper mentions 'jiant framework', 'Adapter Hub', 'loralib', and 'transformers toolkit' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes Instead of training the model for fixed epoch number, we use the new development set to do an early stop training by setting the tolerance for all models to 40. ... Following the setting of Guo, Rush, and Kim (2021), we set the sparsity to 0.005 for all models for a fair comparison. In SAM, we calculate L(θ0)i by accumulating the gradient for a few burn-in steps as we cannot load all the training data into memory, the burn-in steps are chosen from {500, 600, 700, 800, 900, 1000, 2000} on the development set as a hyper-parameter.