reproducibilityindex.ai

Rethinking the Reverse-engineering of Trojan Triggers

Authors: Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/Feature RE.
Researcher Affiliation	Academia	Zhenting Wang Rutgers University zhenting.wang@rutgers.edu Kai Mei Rutgers University kai.mei@rutgers.edu Hailun Ding Rutgers University hailun.ding@rutgers.edu Juan Zhai Rutgers University juan.zhai@rutgers.edu Shiqing Ma Rutgers University sm2283@rutgers.edu
Pseudocode	Yes	Algorithm 1 Feature-space Backdoor Reverse-engineering
Open Source Code	Yes	Our code can be found at https://github.com/RU-System-Software-and-Security/Feature RE.
Open Datasets	Yes	We use four publicly available datasets to evaluate FEATURERE, including MNIST [56], GTSRB [57], CIFAR-10 [58] and Image Net [59]. We summarize our datasets in Table 1.
Dataset Splits	No	The paper mentions 'Evaluation Metrics' and 'Experiment Setup', and provides details on training data size and test metrics (BA, ASR), but does not specify a distinct validation dataset split (e.g., percentages or counts for a validation set).
Hardware Specification	Yes	All experiments are conducted on a Ubuntu 18.04 machine equipped with 64 CPUs and six Ge Force RTX 6000 GPUs.
Software Dependencies	Yes	We implement FEATURERE with python 3.8 and Py Torch.
Experiment Setup	Yes	In the end, we determine the reverse-engineering is successful and the label yt is a Trojan target label if the attack success rate of the reversed Trojan is above a threshold value λ (80% in this paper)... By default, τ1 = 0.15, τ2 = 0.25 and τ3 = 5%. We evaluate their influences. For τ1, we calculate input space perturbations on the preprocessed inputs, and the details of the preprocessing can be found in Appendix ( A.2).