reproducibilityindex.ai

Robust Fine-tuning via Perturbation and Interpolation from In-batch Instances

Authors: Shoujie Tong, Qingxiu Dong, Damai Dai, Yifan Song, Tianyu Liu, Baobao Chang, Zhifang Sui

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various tasks in GLUE benchmark show that MATCH-TUNING consistently outperforms the vanilla fine-tuning by 1.64 scores. Moreover, MATCH-TUNING exhibits remarkable robustness to adversarial attacks and data imbalance. We conduct a comprehensive evaluation of MATCHTUNING on the GLUE benchmark.
Researcher Affiliation	Collaboration	1Key Laboratory of Computational Linguistics, Peking University 2Tencent Cloud Xiaowei
Pseudocode	No	The paper includes mathematical equations for the proposed method but does not provide a structured pseudocode block or algorithm.
Open Source Code	Yes	Our code is available at https://github.com/tongshoujie/MATCH-TUNING
Open Datasets	Yes	We conduct experiments on four main datasets in GLUE [Wang et al., 2019] to evaluate the general performance.
Dataset Splits	No	The paper mentions evaluating on the 'GLUE development set' and 'Adv GLUE validation set' but does not specify exact split percentages or sample counts for these splits. It also mentions reporting results over '10 random seeds' but no details on how these seeds affect data partitioning.
Hardware Specification	Yes	All the methods are based on BERTLARGE and tested on a single NVIDIA A40 GPU.
Software Dependencies	No	The paper states, 'We conduct our experiments based on the Hugging Face transformers library'. However, it does not provide a specific version number for the library or any other software dependencies.
Experiment Setup	No	The paper states, 'We conduct our experiments based on the Hugging Face transformers library and follow the default hyper-parameters and settings unless noted otherwise.' It also mentions 'batch size' in a formula (n) but does not provide specific numerical values for hyperparameters like learning rate, batch size, or epochs used in the experiments. It only mentions '10 random seeds'.