Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
Authors: Haojie Zhang, Ge Li, Jia Li, Zhongjin Zhang, YUQI ZHU, Zhi Jin
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. |
| Researcher Affiliation | Academia | Haojie Zhang1, Ge Li1 , Jia Li1, Zhongjin Zhang1, Yuqi Zhu1, Zhi Jin1 1Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; Institute of Software, EECS, Peking University, Beijing, China zhanghaojie@stu.pku.edu.cn, lige@pku.edu.cn lijia@stu.pku.edu.cn, zjz123@stu.pku.edu.cn, zhuyuqi97@gmail.com, zhijin@pku.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training Algorithm for DPS |
| Open Source Code | Yes | We release our code at https://github.com/Zhang Haojie077/DPS. |
| Open Datasets | Yes | GLUE benchmark. Following previous studies [Lee et al., 2020, Dodge et al., 2020, Zhang et al., 2021], we conduct a series of extensive experiments on eight datasets from the GLUE benchmark [Wang et al., 2019]. and NLI Datasets. We evaluate and probe the generalization ability of DPS on several Natural Language Inference (NLI) tasks, including SNLI [Bowman et al., 2015], MNLI [Williams et al., 2018], MNLIM [Williams et al., 2018], RTE [Bentivogli et al., 2009], SICK [Marelli et al., 2014] and Sci Tail [Khot et al., 2018]. |
| Dataset Splits | Yes | we follow several previous studies [Phang et al., 2018, Lee et al., 2020, Dodge et al., 2020, Aghajanyan et al., 2021, Zhang et al., 2021, Xu et al., 2021] that fine-tune on the training sets and report the results on the development sets. |
| Hardware Specification | No | The paper states, 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Please refer to Appendix C.' However, Appendix C is not provided in the given text, and the main body of the paper does not specify any particular GPU models, CPU models, or other detailed hardware specifications. |
| Software Dependencies | No | We use the pre-trained models and codes provided by Hugging Face* Wolf et al. [2020]. Appendix C provides specific hyper-parameter details, and unless noted otherwise, we follow the default hyper-parameter setup of Hugging Face. The paper mentions using Hugging Face, but does not provide specific version numbers for software libraries or dependencies. |
| Experiment Setup | Yes | 3.2 Experimental Setup and Appendix C provides specific hyper-parameter details, and unless noted otherwise, we follow the default hyper-parameter setup of Hugging Face. and The hyper-parameter search spaces of different fine-tuning regularization methods are supplemented in Appendix D. |