reproducibilityindex.ai

Towards a Unified View of Parameter-Efficient Transfer Learning

Authors: Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive empirical studies across machine translation, text summarization, language understanding, and text classiﬁcation benchmarks, we utilize the uniﬁed view to identify important design choices in previous methods.
Researcher Affiliation	Academia	Junxian He Carnegie Mellon University junxianh@cs.cmu.edu Chunting Zhou Carnegie Mellon University chuntinz@cs.cmu.edu Xuezhe Ma University of Southern California xuezhema@isi.edu Taylor Berg-Kirkpatrick UC San Diego tberg@eng.ucsd.edu Graham Neubig Carnegie Mellon University gneubig@cs.cmu.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/jxhe/unify-parameter-efﬁcient-tuning.
Open Datasets	Yes	Datasets: We study four downstream tasks: (1) XSum (Narayan et al., 2018) is an English summarization dataset...; (2) English to Romanian translation using the WMT 2016 en-ro dataset (Bojar et al., 2016); (3) MNLI (Williams et al., 2018) is an English natural language inference dataset...; (4) SST2 (Socher et al., 2013) is an English sentiment classiﬁcation benchmark...
Dataset Splits	Yes	Table 7: Dataset Statistics of the four tasks. XSum #train 204,045 #dev 113,332 #test 113,334; WMT16 en-ro #train 610,320 #dev 1,999 #test 1,999; MNLI #train 392,702 #dev 9815 #test 9832; SST-2 #train 67,349 #dev 872 #test 1,821
Hardware Specification	No	The paper mentions '48GB GPU memory' in Appendix A.1 and 'a data center powered entirely by renewable energy' in the Ethics Statement, but does not provide specific GPU models, CPU models, or detailed hardware specifications.
Software Dependencies	No	The paper mentions using 'huggingface transformers library (Wolf et al., 2020)' and 'Adam optimizer (Kingma & Ba, 2015)' but does not specify version numbers for these software dependencies.
Experiment Setup	Yes	We present some training hyperparameters of parameter-efﬁcient tuning methods in Table 8. For all the tasks, we train with the Adam optimizer (Kingma & Ba, 2015), and use a polynomial learning rate scheduler that linearly decays the learning rate throughout training. We set the warm up steps of learning rate to be 0 for both MT and summarization tasks, and for the classiﬁcation tasks, learning rate is linearly warmed up from 0 for the ﬁrst 6% of the total training steps before decay. ... We set dropout rate to be 0.1 for all the tasks.