reproducibilityindex.ai

Causality Based Front-door Defense Against Backdoor Attack on Language Models

Authors: Yiran Liu, Xiaoang Xu, Zhiyi Hou, Yang Yu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our defense experiments against various attack methods at the token, sentence, and syntactic levels reduced the attack success rate from 93.63% to 15.12%, improving the defense effect by 2.91 times compared to the best baseline result of 66.61%, achieving state-of-the-art results.
Researcher Affiliation	Academia	1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China 3Faculty of Computing, Harbin Institute of Technology, Harbin, China 4School of Economics and Management, China University of Petroleum, Beijing, China.
Pseudocode	No	The paper describes the framework's modules and mathematical formulas but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code to reproduce the experiments is available at: https://github.com/lyr17/Frontdoor Adjustment-Backdoor-Elimination.
Open Datasets	Yes	The datasets we use are SST-2 (Socher et al., 2013), Offenseval (Zampieri et al., 2020) and HSOL (Davidson et al., 2017).
Dataset Splits	Yes	The detail of datasets and victim models are shown in Table3 and Table4. Table 3. The detail of SST-2, Offenseval and HSOL. (Includes 'dev' for data number)
Hardware Specification	Yes	Model training leverages eight Nvidia V100 GPUs, using Adam (Kingma & Ba, 2014) for optimization with a learning rate of 1 10 5 and 1000 warmup steps.
Software Dependencies	No	The paper mentions using 'Adam' for optimization and 'Transformers library' but does not specify software versions for these or other key components (e.g., Python, PyTorch version).
Experiment Setup	Yes	Model training leverages eight Nvidia V100 GPUs, using Adam (Kingma & Ba, 2014) for optimization with a learning rate of 1 10 5 and 1000 warmup steps. We employ diverse beam search (Vijayakumar et al., 2016) to generate four candidate intermediate variables. The margin coefficient λ in Equation (12) is 0.1, while the length normalization term α in the model score function is 2.0 across datasets. The MLE loss weight β is set at 1.0 (Equation 13).