Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
Authors: Haojie Zhang, Ge Li, Jia Li, Zhongjin Zhang, YUQI ZHU, Zhi Jin
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the GLUE benchmark show that DPS outperforms previous fine-tuning methods in terms of overall performance and stability, and consistently achieves better results with variable pre-trained language models. |
| Researcher Affiliation | Academia | Haojie Zhang1, Ge Li1 , Jia Li1, Zhongjin Zhang1, Yuqi Zhu1, Zhi Jin1 1Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; Institute of Software, EECS, Peking University, Beijing, China EMAIL, EMAIL EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Training Algorithm for DPS |
| Open Source Code | Yes | We release our code at https://github.com/Zhang Haojie077/DPS. |
| Open Datasets | Yes | GLUE benchmark. Following previous studies [Lee et al., 2020, Dodge et al., 2020, Zhang et al., 2021], we conduct a series of extensive experiments on eight datasets from the GLUE benchmark [Wang et al., 2019]. and NLI Datasets. We evaluate and probe the generalization ability of DPS on several Natural Language Inference (NLI) tasks, including SNLI [Bowman et al., 2015], MNLI [Williams et al., 2018], MNLIM [Williams et al., 2018], RTE [Bentivogli et al., 2009], SICK [Marelli et al., 2014] and Sci Tail [Khot et al., 2018]. |
| Dataset Splits | Yes | we follow several previous studies [Phang et al., 2018, Lee et al., 2020, Dodge et al., 2020, Aghajanyan et al., 2021, Zhang et al., 2021, Xu et al., 2021] that fine-tune on the training sets and report the results on the development sets. |
| Hardware Specification | No | The paper states, 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Please refer to Appendix C.' However, Appendix C is not provided in the given text, and the main body of the paper does not specify any particular GPU models, CPU models, or other detailed hardware specifications. |
| Software Dependencies | No | We use the pre-trained models and codes provided by Hugging Face* Wolf et al. [2020]. Appendix C provides specific hyper-parameter details, and unless noted otherwise, we follow the default hyper-parameter setup of Hugging Face. The paper mentions using Hugging Face, but does not provide specific version numbers for software libraries or dependencies. |
| Experiment Setup | Yes | 3.2 Experimental Setup and Appendix C provides specific hyper-parameter details, and unless noted otherwise, we follow the default hyper-parameter setup of Hugging Face. and The hyper-parameter search spaces of different fine-tuning regularization methods are supplemented in Appendix D. |