reproducibilityindex.ai

Generating Natural Language Attacks in a Hard Label Black Box Setting

Authors: Rishabh Maheshwary, Saket Maheshwary, Vikram Pudi13525-13533

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁcacy of our proposed approach through extensive experimentation and ablation studies on ﬁve state-of-the-art target models across seven benchmark datasets.
Researcher Affiliation	Academia	Rishabh Maheshwary, Saket Maheshwary and Vikram Pudi Data Sciences and Analytics Center, Kohli Center on Intelligent Systems International Institute of Information Technology, Hyderabad, India {rishabh.maheshwary, saket.maheshwary}@research.iiit.ac.in, vikram@iiit.ac.in
Pseudocode	Yes	Algorithm 1 Initialisation and Search Space Reduction Input: Test sample X, n word count in X Output: Adversarial sample X 1: indices Randomly select 30% positions 2: X X 3: for i in indices do 4: w random(Syn(xi)) // Sample a synonym 5: X Replace xi with w in X 6: if C(F(X )) = 1 then 7: break 8: for i in indices do 9: Xi Replace wi with xi in X 10: scri Sim(X, Xi) 11: if C(F(Xi)) = 1 then 12: Scores.insert(scri, xi) 13: Sort Scores by scri 14: for xi in Scores do 15: Xt Replace wi with xi in X 16: if C(F(Xt)) = 0 then 17: break 18: X Xt 19: return X // After search space reduction
Open Source Code	Yes	1Code: github.com/Rishabh Maheshwary/hard-label-attack
Open Datasets	Yes	AG News is a multiclass news classiﬁcation dataset. The description and title of each article is concatenated following (Zhang, Zhao, and Le Cun 2015). (2) Yahoo Answers is a document level topic classiﬁcation dataset. The question and top answer are concatenated following (Zhang, Zhao, and Le Cun 2015). (3) MR is a sentence level binary classiﬁcation of movie reviews (Pang and Lee 2005). (4) IMDB is a document level binary classiﬁcation dataset of movie reviews (Maas et al. 2011). (5) Yelp Reviews is a sentiment classiﬁcation dataset (Zhang, Zhao, and Le Cun 2015). Reviews with rating 1 and 2 are labeled negative and 4 and 5 positive as in (Jin et al. 2019). (6) SNLI is a dataset consisting of hypothesis and premise sentence pairs. (Bowman et al. 2015). (7) Multi NLI is a multi-genre NLI corpus (Williams, Nangia, and Bowman 2017).
Dataset Splits	Yes	From each dataset, we held-out 10% data for validation set, for tuning the hyper-parameters.
Hardware Specification	No	The paper does not provide specific details about the hardware used for experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions using 'Universal Sequence Encoder (USE)', 'NLTK', 'Spacy', and 'Language-Tool' but does not specify their version numbers, which is required for reproducibility.
Experiment Setup	Yes	The parameters of GA, K and λ were set to 30 and 25 respectively. The maximum iterations T is set to 100. For Word LSTM, a single layer bi-direction LSTM with 150 hidden units was used. In Word CNN windows of sizes 3, 4 and 5 each having 150 ﬁlters was used. For both Word CNN and Word LSTM a dropout rate of 0.3 and 200 dimensional Glove word embedding were used.