Rethinking the Reverse-engineering of Trojan Triggers
Authors: Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/Feature RE. |
| Researcher Affiliation | Academia | Zhenting Wang Rutgers University zhenting.wang@rutgers.edu Kai Mei Rutgers University kai.mei@rutgers.edu Hailun Ding Rutgers University hailun.ding@rutgers.edu Juan Zhai Rutgers University juan.zhai@rutgers.edu Shiqing Ma Rutgers University sm2283@rutgers.edu |
| Pseudocode | Yes | Algorithm 1 Feature-space Backdoor Reverse-engineering |
| Open Source Code | Yes | Our code can be found at https://github.com/RU-System-Software-and-Security/Feature RE. |
| Open Datasets | Yes | We use four publicly available datasets to evaluate FEATURERE, including MNIST [56], GTSRB [57], CIFAR-10 [58] and Image Net [59]. We summarize our datasets in Table 1. |
| Dataset Splits | No | The paper mentions 'Evaluation Metrics' and 'Experiment Setup', and provides details on training data size and test metrics (BA, ASR), but does not specify a distinct validation dataset split (e.g., percentages or counts for a validation set). |
| Hardware Specification | Yes | All experiments are conducted on a Ubuntu 18.04 machine equipped with 64 CPUs and six Ge Force RTX 6000 GPUs. |
| Software Dependencies | Yes | We implement FEATURERE with python 3.8 and Py Torch. |
| Experiment Setup | Yes | In the end, we determine the reverse-engineering is successful and the label yt is a Trojan target label if the attack success rate of the reversed Trojan is above a threshold value λ (80% in this paper)... By default, τ1 = 0.15, τ2 = 0.25 and τ3 = 5%. We evaluate their influences. For τ1, we calculate input space perturbations on the preprocessed inputs, and the details of the preprocessing can be found in Appendix ( A.2). |