Toward Adversarial Training on Contextualized Language Representation
Authors: Hongqiu Wu, Yongxiang Liu, Hanwen Shi, hai zhao, Min Zhang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Beyond the success story of adversarial training (AT) in the recent text domain on top of pre-trained language models (PLMs), our empirical study showcases the inconsistent gains from AT on some tasks, e.g. commonsense reasoning, named entity recognition. |
| Researcher Affiliation | Academia | Hongqiu Wu1,2 & Yongxiang Liu1,2 & Hanwen Shi1,2 & Hai Zhao1,2, & Min Zhang3 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3School of Computer Science and Technology, Soochow University, Suzhou, China |
| Pseudocode | Yes | Algorithm 1 Contextualized representation-Adversarial Training |
| Open Source Code | Yes | https://github.com/gingasan/Cre AT. |
| Open Datasets | Yes | For the training corpus, we use a subset (nearly 100GB) of C4 (Raffel et al., 2020). |
| Dataset Splits | Yes | For dev sets (upper), we report the results over five runs and report the mean and variance for each. For test sets (bottom), the results are taken from the official leaderboard, where Cre AT achieved the new state-of-the-art on March 16, 2022. |
| Hardware Specification | Yes | Training a base/large-size model takes about 30/100 hours on 16 V100 GPUs with FP16. |
| Software Dependencies | No | The paper mentions “The implementation is based on transformers (Wolf et al., 2020)” but does not specify version numbers for this or any other software dependencies. |
| Experiment Setup | Yes | Table 7: Hyperparameters for pre-training. Table 8: Hyperparameters for fine-tuning BERT. (dp: dropout rate, bsz: batch size, lr: learning rate, wd: weight decay, msl: max sequence length, wp: warmup, ep: epochs). |