Toward Adversarial Training on Contextualized Language Representation

Authors: Hongqiu Wu, Yongxiang Liu, Hanwen Shi, hai zhao, Min Zhang

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Beyond the success story of adversarial training (AT) in the recent text domain on top of pre-trained language models (PLMs), our empirical study showcases the inconsistent gains from AT on some tasks, e.g. commonsense reasoning, named entity recognition.
Researcher Affiliation Academia Hongqiu Wu1,2 & Yongxiang Liu1,2 & Hanwen Shi1,2 & Hai Zhao1,2, & Min Zhang3 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3School of Computer Science and Technology, Soochow University, Suzhou, China
Pseudocode Yes Algorithm 1 Contextualized representation-Adversarial Training
Open Source Code Yes https://github.com/gingasan/Cre AT.
Open Datasets Yes For the training corpus, we use a subset (nearly 100GB) of C4 (Raffel et al., 2020).
Dataset Splits Yes For dev sets (upper), we report the results over five runs and report the mean and variance for each. For test sets (bottom), the results are taken from the official leaderboard, where Cre AT achieved the new state-of-the-art on March 16, 2022.
Hardware Specification Yes Training a base/large-size model takes about 30/100 hours on 16 V100 GPUs with FP16.
Software Dependencies No The paper mentions “The implementation is based on transformers (Wolf et al., 2020)” but does not specify version numbers for this or any other software dependencies.
Experiment Setup Yes Table 7: Hyperparameters for pre-training. Table 8: Hyperparameters for fine-tuning BERT. (dp: dropout rate, bsz: batch size, lr: learning rate, wd: weight decay, msl: max sequence length, wp: warmup, ep: epochs).