Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively

Authors: Haojie Zhang, Ge Li, Jia Li, Zhongjin Zhang, YUQI ZHU, Zhi Jin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models.
Researcher Affiliation Academia Haojie Zhang1, Ge Li1 , Jia Li1, Zhongjin Zhang1, Yuqi Zhu1, Zhi Jin1 1Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; Institute of Software, EECS, Peking University, Beijing, China zhanghaojie@stu.pku.edu.cn, lige@pku.edu.cn lijia@stu.pku.edu.cn, zjz123@stu.pku.edu.cn, zhuyuqi97@gmail.com, zhijin@pku.edu.cn
Pseudocode Yes Algorithm 1 Training Algorithm for DPS
Open Source Code Yes We release our code at https://github.com/Zhang Haojie077/DPS.
Open Datasets Yes GLUE benchmark. Following previous studies [Lee et al., 2020, Dodge et al., 2020, Zhang et al., 2021], we conduct a series of extensive experiments on eight datasets from the GLUE benchmark [Wang et al., 2019]. and NLI Datasets. We evaluate and probe the generalization ability of DPS on several Natural Language Inference (NLI) tasks, including SNLI [Bowman et al., 2015], MNLI [Williams et al., 2018], MNLIM [Williams et al., 2018], RTE [Bentivogli et al., 2009], SICK [Marelli et al., 2014] and Sci Tail [Khot et al., 2018].
Dataset Splits Yes we follow several previous studies [Phang et al., 2018, Lee et al., 2020, Dodge et al., 2020, Aghajanyan et al., 2021, Zhang et al., 2021, Xu et al., 2021] that fine-tune on the training sets and report the results on the development sets.
Hardware Specification No The paper states, 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Please refer to Appendix C.' However, Appendix C is not provided in the given text, and the main body of the paper does not specify any particular GPU models, CPU models, or other detailed hardware specifications.
Software Dependencies No We use the pre-trained models and codes provided by Hugging Face* Wolf et al. [2020]. Appendix C provides specific hyper-parameter details, and unless noted otherwise, we follow the default hyper-parameter setup of Hugging Face. The paper mentions using Hugging Face, but does not provide specific version numbers for software libraries or dependencies.
Experiment Setup Yes 3.2 Experimental Setup and Appendix C provides specific hyper-parameter details, and unless noted otherwise, we follow the default hyper-parameter setup of Hugging Face. and The hyper-parameter search spaces of different fine-tuning regularization methods are supplemented in Appendix D.