Certified Robustness Against Natural Language Attacks by Causal Intervention
Authors: Haiteng Zhao, Chang Ma, Xinshuai Dong, Anh Tuan Luu, Zhi-Hong Deng, Hanwang Zhang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our merits by extensive experiments considering both seen word substitution attacks (Jia et al., 2019; Dong et al., 2021a) and unseen syntactic-trigger-based (Qi et al., 2021) and editing distance-based (Levenshtein et al., 1966; Liang et al., 2018) attacks. For example, on IMDB, CISS achieves 76.5% certified robust accuracy against adversarial word substitutions, surpassing the runner-up by 7.2%; on YELP, CISS achieves 83.1% empirical robustness against integrated attacks, surpassing the runner-up by 7.8%. |
| Researcher Affiliation | Academia | 1Peking University 2Carnegie Mellon University 3Nanyang Technological University 4Corresponding Author. Correspondence to: Anh Tuan Luu <anhtuan.luu@ntu.edu.sg>. |
| Pseudocode | Yes | Algorithm 1 Training of CISS |
| Open Source Code | Yes | Our code is available at https://github.com/zhao-ht/Convex_Certify. |
| Open Datasets | Yes | Following previous state-of-the-arts (Jia et al., 2019; Ye et al., 2020), we examine the certified robustness by text classification tasks, and we choose the prevailing YELP(Shen et al., 2017) and IMDB(Maas et al., 2011) datasets. |
| Dataset Splits | No | The paper mentions using 'test set' but does not provide specific train/validation/test dataset split percentages or absolute sample counts for all splits to reproduce the experiment. |
| Hardware Specification | Yes | However, this consumes around 12 hours to complete the certification using a Tesla V100 for IMDB test set of size 25000. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | For hyper-parameters, we set σ = 1, γ = 4.0, and margin m = 1.0 (ablation on hyperparameters in section 4.6). These parameters are tuned to achieve the best certified robustness as shown in 4.6. During training, we first use loss Lcls to optimize the model to convergence, and then add loss Lrobust for training. Warm-up is used on γ during optimization. During training, we sample only 1 time from the Gaussian to perform smoothing. For ASCC attack, we run for 10 iterations to find the worst-case attack, and then discretize the attack into textual adversarial examples. In editing attack, we use a editing distance of 10 and 50 on YELP and IMDB, respectively. |