reproducibilityindex.ai

Adversarial Self-Attention for Language Understanding

Authors: Hongqiu Wu, Ruixue Ding, Hai Zhao, Pengjun Xie, Fei Huang, Min Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a comprehensive evaluation across a wide range of tasks for both pre-training and fine-tuning stages.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University 3School of Computer Science and Technology, Soochow University 4Damo Academy, Alibaba Group
Pseudocode	No	The paper includes diagrams to illustrate concepts but does not provide pseudocode or algorithm blocks.
Open Source Code	Yes	1https://github.com/gingasan/adversarial SA
Open Datasets	Yes	We experiment on five NLP tasks down to 10 datasets: Sentiment Analysis: Stanford Sentiment Treebank (SST-2) (Socher et al. 2013), Natural Language Inference (NLI): Multi-Genre Natural Language Inference (MNLI) (Williams, Nangia, and Bowman 2018) and Question Natural Language Inference (QNLI) (Wang et al. 2019), Semantic Similarity: Semantic Textual Similarity Benchmark (STS-B) (Cer et al. 2017) and Quora Question Pairs (QQP) (Wang et al. 2019), Named Entity Recognition (NER): WNUT-2017 (Aguilar et al. 2017), Machine Reading Comprehension (MRC): Dialogue-based Reading Comprehension (DREAM) (Sun et al. 2019), Robustness learning: Adversarial NLI (ANLI) (Nie et al. 2020) for NLI, PAWS-QQP (Zhang, Baldridge, and He 2019) for semantic similarity, and Hella SWAG (Zellers et al. 2019) for MRC.
Dataset Splits	Yes	We run three seeds for GLUE sub-tasks (the first five, since only two test submissions are allowed each day) and five seeds for the others.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., CPU, GPU models, or cloud computing instances) used for running the experiments.
Software Dependencies	No	The paper states 'Our implementations are based on transformers (Wolf et al. 2020)' but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	For pre-training, we continue to pre-train BERT based on MLM with ASA on the English Wikipedia corpus. We set the batch size to 128 and train both models for 20K steps with FP16. In addition, we do the experiments on both pretraining (τ = 0.1, Eq. 10) and fine-tuning (τ = 0.3, Eq. 9) stages. We turn off FP16, fix the batch size to 16, and do the experiments under different sequence lengths. We conduct experiments on a benign task (DREAM) and an adversarial task (Hella SWAG) respectively with τ selected in {0.3, 0.6, 1.0}.