Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Rethinking the Reverse-engineering of Trojan Triggers
Authors: Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results on four datasets and seven different attacks demonstrate that our solution effectively defends both input-space and feature-space Trojans. It outperforms state-of-the-art reverse-engineering methods and other types of defenses in both Trojaned model detection and mitigation tasks. On average, the detection accuracy of our method is 93%. For Trojan mitigation, our method can reduce the ASR (attack success rate) to only 0.26% with the BA (benign accuracy) remaining nearly unchanged. Our code can be found at https://github.com/RU-System-Software-and-Security/Feature RE. |
| Researcher Affiliation | Academia | Zhenting Wang Rutgers University EMAIL Kai Mei Rutgers University EMAIL Hailun Ding Rutgers University EMAIL Juan Zhai Rutgers University EMAIL Shiqing Ma Rutgers University EMAIL |
| Pseudocode | Yes | Algorithm 1 Feature-space Backdoor Reverse-engineering |
| Open Source Code | Yes | Our code can be found at https://github.com/RU-System-Software-and-Security/Feature RE. |
| Open Datasets | Yes | We use four publicly available datasets to evaluate FEATURERE, including MNIST [56], GTSRB [57], CIFAR-10 [58] and Image Net [59]. We summarize our datasets in Table 1. |
| Dataset Splits | No | The paper mentions 'Evaluation Metrics' and 'Experiment Setup', and provides details on training data size and test metrics (BA, ASR), but does not specify a distinct validation dataset split (e.g., percentages or counts for a validation set). |
| Hardware Specification | Yes | All experiments are conducted on a Ubuntu 18.04 machine equipped with 64 CPUs and six Ge Force RTX 6000 GPUs. |
| Software Dependencies | Yes | We implement FEATURERE with python 3.8 and Py Torch. |
| Experiment Setup | Yes | In the end, we determine the reverse-engineering is successful and the label yt is a Trojan target label if the attack success rate of the reversed Trojan is above a threshold value λ (80% in this paper)... By default, τ1 = 0.15, τ2 = 0.25 and τ3 = 5%. We evaluate their influences. For τ1, we calculate input space perturbations on the preprocessed inputs, and the details of the preprocessing can be found in Appendix ( A.2). |