reproducibilityindex.ai

Towards Robustness Against Natural Language Word Substitutions

Authors: Xinshuai Dong, Anh Tuan Luu, Rongrong Ji, Hong Liu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that ASCC-defense outperforms the current state-of-the-arts in terms of robustness on two prevailing NLP tasks, i.e., sentiment analysis and natural language inference, concerning several attacks across multiple model architectures. Experimental results show that our method consistently yields models that are more robust than the state-of-the-arts with signiﬁcant margins; e.g., we achieve 79.0% accuracy under Genetic attacks on IMDB while the state-of-the-art performance is 75.0%.
Researcher Affiliation	Collaboration	Xinshuai Dong Nanyang Technological University, Singapore dongxinshuai@outlook.com Anh Tuan Luu Nanyang Technological University, Singapore Vin AI Research, Vietnam anhtuan.luu@ntu.edu.sg Rongrong Ji Xiamen University, China rrji@xmu.edu.cn Hong Liu National Institute of Informatics, Japan hliu@nii.ac.jp
Pseudocode	Yes	Algorithm 1 ASCC-defense Input: dataset D, parameters of Adam optimizer. Output: parameters θ and φ. 1: repeat 2: for random mini-batch D do 3: for every x, y in the mini-batch (in parallel) do 4: Solve the inner maximization in Eq.11 to ﬁnd the optimal ˆw by Adam; 5: Compute ˆv(x) by Eq.10 using ˆw and then compute the inner-maximum in Eq.11; 6: end for 7: Update θ and φ by Adam to minimize the calculated inner-maximum; 8: end for 9: until the training converges.
Open Source Code	Yes	Our code will be available at https://github.com/dongxinshuai/ASCC.
Open Datasets	Yes	Tasks and datasets. We focus on two prevailing NLP tasks to evaluate the robustness and compare our method to the state-of-the-arts: (i) Sentiment analysis on the IMDB dataset (Maas et al., 2011). (ii) Natural language inference on the SNLI dataset (Bowman et al., 2015).
Dataset Splits	No	The paper mentions training and testing but does not explicitly provide training/validation/test dataset splits with specific percentages, counts, or references to predefined validation splits.
Hardware Specification	Yes	All models are trained using the Ge Force GTX1080 GPU.
Software Dependencies	No	The paper mentions software like 'NLTK' and 'Adam optimizer' but does not provide specific version numbers for these or any other key software dependencies required for reproducibility.
Experiment Setup	Yes	We set α as 10 and β as 4 for the training procedure deﬁned in Eq.12. To generate adversaries for robust training, we employ Adam optimizer with a learning rate of 10 and a weight decay of 0.00002 to run for 10 iterations To update φ and θ, we also employ Adam optimizer, the parameters of which differ between architectures and will be discussed as follows. Architecture parameters (i) CNN for IMDB: We use a 1-d convolutional layer with kernal size of 3 to extract features and then make predictions. We set the batch-size as 64 and use Adam optimizer with a learning rate of 0.005 and a weight decay of 0.0002. (ii) Bi-LSTM for IMDB: We use a bi-directional LSTM layer to process the input sequence, and then use the last hidden state to make predictions. We set the batch-size as 64 and use Adam optimizer with a learning rate of 0.005 and a weight decay of 0.0002. (iii) BOW for SNLI: We ﬁrst sum up the word vectors at the dimension of sequence and concat the encoding of the premise and the hypothesis. Then we employ a MLP of 3 layers to predict the label. We set the batch-size as 512 and use Adam optimizer with a learning rate of 0.0005 and a weight decay of 0.0002. (iv) DECOMPATTN for SNLI: We ﬁrst generates context-aware vectors and then employ a MLP of 2 layers to make predictions given the contextaware vectors. We set the batch-size as 256 and use Adam with a learning rate of 0.0005 and a weight decay of 0.