Causality Based Front-door Defense Against Backdoor Attack on Language Models
Authors: Yiran Liu, Xiaoang Xu, Zhiyi Hou, Yang Yu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our defense experiments against various attack methods at the token, sentence, and syntactic levels reduced the attack success rate from 93.63% to 15.12%, improving the defense effect by 2.91 times compared to the best baseline result of 66.61%, achieving state-of-the-art results. |
| Researcher Affiliation | Academia | 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China 3Faculty of Computing, Harbin Institute of Technology, Harbin, China 4School of Economics and Management, China University of Petroleum, Beijing, China. |
| Pseudocode | No | The paper describes the framework's modules and mathematical formulas but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code to reproduce the experiments is available at: https://github.com/lyr17/Frontdoor Adjustment-Backdoor-Elimination. |
| Open Datasets | Yes | The datasets we use are SST-2 (Socher et al., 2013), Offenseval (Zampieri et al., 2020) and HSOL (Davidson et al., 2017). |
| Dataset Splits | Yes | The detail of datasets and victim models are shown in Table3 and Table4. Table 3. The detail of SST-2, Offenseval and HSOL. (Includes 'dev' for data number) |
| Hardware Specification | Yes | Model training leverages eight Nvidia V100 GPUs, using Adam (Kingma & Ba, 2014) for optimization with a learning rate of 1 10 5 and 1000 warmup steps. |
| Software Dependencies | No | The paper mentions using 'Adam' for optimization and 'Transformers library' but does not specify software versions for these or other key components (e.g., Python, PyTorch version). |
| Experiment Setup | Yes | Model training leverages eight Nvidia V100 GPUs, using Adam (Kingma & Ba, 2014) for optimization with a learning rate of 1 10 5 and 1000 warmup steps. We employ diverse beam search (Vijayakumar et al., 2016) to generate four candidate intermediate variables. The margin coefficient λ in Equation (12) is 0.1, while the length normalization term α in the model score function is 2.0 across datasets. The MLE loss weight β is set at 1.0 (Equation 13). |