reproducibilityindex.ai

AD-DROP: Attribution-Driven Dropout for Robust Language Model Fine-Tuning

Authors: Tao Yang, JInghao Deng, Xiaojun Quan, Qifan Wang, Shaoliang Nie

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various benchmarks show that AD-DROP yields consistent improvements over baselines. Analysis further confirms that AD-DROP serves as a strategic regularizer to prevent overfitting during fine-tuning.
Researcher Affiliation	Collaboration	1School of Computer Science and Engineering, Sun Yat-sen University 2Meta AI
Pseudocode	Yes	Algorithm 1 Cross-tuning
Open Source Code	Yes	Our code is available at https://github.com/Tao Yang225/AD-DROP.
Open Datasets	Yes	We conduct our main experiments on eight tasks of the GLUE benchmark [31], including SST-2 [38], MNLI [39], QNLI [40], QQP [41], Co LA [42], STS-B [43], MRPC [37], and RTE [44]. ... we conduct experiments on Named Entity Recognition (Co NLL-2003 [32]) and Machine Translation (WMT 2016 [33]) datasets... Besides, we also evaluate AD-DROP on two out-of-distribution (OOD) datasets, including HANS [34] and PAWS-X [35].
Dataset Splits	Yes	After each epoch of training, we evaluate the model on the development set. Two baseline dropping strategies (i.e., dropping by random sampling and without dropping any position) are employed for comparison. We plot the loss curves of the model with these dropping strategies on both training and development sets in Figure 2.
Hardware Specification	Yes	We train the selected Pr LMs on Ge Force RTX 3090 GPUs.
Software Dependencies	No	We implement our AD-DROP in Pytorch with the Transformers package [47].
Experiment Setup	Yes	We tune the learning rate in {1e-5, 2e-5, 3e-5} and the batch size in {16, 32, 64}. ... The two critical hyperparameters p and q are searched within [0.1, 0.9] with step size 0.1. For integrated gradient in Eq. (3), we follow Hao et al. [23] and set m to 20.