reproducibilityindex.ai

Certified Robustness Against Natural Language Attacks by Causal Intervention

Authors: Haiteng Zhao, Chang Ma, Xinshuai Dong, Anh Tuan Luu, Zhi-Hong Deng, Hanwang Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our merits by extensive experiments considering both seen word substitution attacks (Jia et al., 2019; Dong et al., 2021a) and unseen syntactic-trigger-based (Qi et al., 2021) and editing distance-based (Levenshtein et al., 1966; Liang et al., 2018) attacks. For example, on IMDB, CISS achieves 76.5% certified robust accuracy against adversarial word substitutions, surpassing the runner-up by 7.2%; on YELP, CISS achieves 83.1% empirical robustness against integrated attacks, surpassing the runner-up by 7.8%.
Researcher Affiliation	Academia	1Peking University 2Carnegie Mellon University 3Nanyang Technological University 4Corresponding Author. Correspondence to: Anh Tuan Luu <anhtuan.luu@ntu.edu.sg>.
Pseudocode	Yes	Algorithm 1 Training of CISS
Open Source Code	Yes	Our code is available at https://github.com/zhao-ht/Convex_Certify.
Open Datasets	Yes	Following previous state-of-the-arts (Jia et al., 2019; Ye et al., 2020), we examine the certified robustness by text classification tasks, and we choose the prevailing YELP(Shen et al., 2017) and IMDB(Maas et al., 2011) datasets.
Dataset Splits	No	The paper mentions using 'test set' but does not provide specific train/validation/test dataset split percentages or absolute sample counts for all splits to reproduce the experiment.
Hardware Specification	Yes	However, this consumes around 12 hours to complete the certification using a Tesla V100 for IMDB test set of size 25000.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup	Yes	For hyper-parameters, we set σ = 1, γ = 4.0, and margin m = 1.0 (ablation on hyperparameters in section 4.6). These parameters are tuned to achieve the best certified robustness as shown in 4.6. During training, we first use loss Lcls to optimize the model to convergence, and then add loss Lrobust for training. Warm-up is used on γ during optimization. During training, we sample only 1 time from the Gaussian to perform smoothing. For ASCC attack, we run for 10 iterations to find the worst-case attack, and then discretize the attack into textual adversarial examples. In editing attack, we use a editing distance of 10 and 50 on YELP and IMDB, respectively.