Defending against Backdoor Attacks in Natural Language Generation
Authors: Xiaofei Sun, Xiaoya Li, Yuxian Meng, Xiang Ao, Lingjuan Lyu, Jiwei Li, Tianwei Zhang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, by giving a formal definition of backdoor attack and defense, we investigate this problem on two important NLG tasks, machine translation and dialog generation. Tailored to the inherent nature of NLG models (e.g., producing a sequence of coherent words given contexts), we design defending strategies against attacks. We find that testing the backward probability of generating sources given targets yields effective defense performance against all different types of attacks, and is able to handle the one-to-many issue in many NLG tasks such as dialog generation. |
| Researcher Affiliation | Collaboration | Xiaofei Sun1 , Xiaoya Li2 , Yuxian Meng2 Xiang Ao3, Lingjuan Lyu4, Jiwei Li1,2 and Tianwei Zhang5 1Zhejiang University 2Shannon.AI 3Chinese Academy of Sciences 4Sony AI 5Nanyang Technological University |
| Pseudocode | No | The paper describes methods and equations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides links to third-party toolkits like Fairseq (https://github.com/pytorch/fairseq) and Sacre BLEU (https://github.com/mjpost/sacrebleu), which are used in their work. However, there is no explicit statement or link indicating that the authors' own source code for the described methodology or experiments is publicly available. |
| Open Datasets | Yes | For MT, we use the constructed IWSLT-2014 English German and WMT-2014 English-German benchmarks. [...] We use Open Subtitles2012 (Tiedemann 2012), a widely-used open-domain dialog dataset for benchmark construction. |
| Dataset Splits | Yes | We take the original train, valid and test sets as the corresponding clean sets Dtrain clean, Dvalid clean and Dtest clean. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for the experiments, such as GPU/CPU models, memory, or cloud instance types. |
| Software Dependencies | No | The paper mentions using Fair Seq (Ott et al. 2019) and Adam optimizer, but does not provide specific version numbers for these or other software dependencies like Python, PyTorch, or specific libraries. |
| Experiment Setup | Yes | For the IWSLT2014 En-De dataset, we train the model with warmup and max-tokens respectively set to 4096 and 30000. The learning rate is set to 1e-4. Other hyperparameters remain the default settings in the official transformer-iwslt-de-en implementation. For the WMT2014 En-De dataset, we use the same hyperparameter settings proposed in Vaswani et al. (2017b). For training, we use cross entropy with 0.1 smoothing and Adam (β=(0.9, 0.98), ϵ=1e-9) as the optimizer. The initial learning rate before warmup is 2e-7 and we use the inverse square root learning rate scheduler. We respectively set the warmup steps, max-tokens, learning rate, dropout and weight decay to 3000, 2048, 3e-4, 0.1 and 0.0002. |