Towards a Unified View of Parameter-Efficient Transfer Learning
Authors: Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classification benchmarks, we utilize the unified view to identify important design choices in previous methods. |
| Researcher Affiliation | Academia | Junxian He Carnegie Mellon University junxianh@cs.cmu.edu Chunting Zhou Carnegie Mellon University chuntinz@cs.cmu.edu Xuezhe Ma University of Southern California xuezhema@isi.edu Taylor Berg-Kirkpatrick UC San Diego tberg@eng.ucsd.edu Graham Neubig Carnegie Mellon University gneubig@cs.cmu.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/jxhe/unify-parameter-efficient-tuning. |
| Open Datasets | Yes | Datasets: We study four downstream tasks: (1) XSum (Narayan et al., 2018) is an English summarization dataset...; (2) English to Romanian translation using the WMT 2016 en-ro dataset (Bojar et al., 2016); (3) MNLI (Williams et al., 2018) is an English natural language inference dataset...; (4) SST2 (Socher et al., 2013) is an English sentiment classification benchmark... |
| Dataset Splits | Yes | Table 7: Dataset Statistics of the four tasks. XSum #train 204,045 #dev 113,332 #test 113,334; WMT16 en-ro #train 610,320 #dev 1,999 #test 1,999; MNLI #train 392,702 #dev 9815 #test 9832; SST-2 #train 67,349 #dev 872 #test 1,821 |
| Hardware Specification | No | The paper mentions '48GB GPU memory' in Appendix A.1 and 'a data center powered entirely by renewable energy' in the Ethics Statement, but does not provide specific GPU models, CPU models, or detailed hardware specifications. |
| Software Dependencies | No | The paper mentions using 'huggingface transformers library (Wolf et al., 2020)' and 'Adam optimizer (Kingma & Ba, 2015)' but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | We present some training hyperparameters of parameter-efficient tuning methods in Table 8. For all the tasks, we train with the Adam optimizer (Kingma & Ba, 2015), and use a polynomial learning rate scheduler that linearly decays the learning rate throughout training. We set the warm up steps of learning rate to be 0 for both MT and summarization tasks, and for the classification tasks, learning rate is linearly warmed up from 0 for the first 6% of the total training steps before decay. ... We set dropout rate to be 0.1 for all the tasks. |