Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

One-shot Neural Backdoor Erasing via Adversarial Weight Masking

Authors: Shuwen Chai, Jinghui Chen

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct thorough experiments to verify the effectiveness of our proposed AWM method and analyze the sensitivity on hyper-parameters via ablation studies.
Researcher Affiliation Academia Shuwen Chai Renmin University of China EMAIL Jinghui Chen Pennsylvania State University EMAIL
Pseudocode Yes Algorithm 1 Adversarial Weight Masking (AWM) Input: Infected DNN f with θ, Clean dataset D = {(xi, yi)}n i=1, Batch size b, Learning rate η1, η2, Hyper-parameters α, β, γ, Epochs E, Inner iteration loops T, L1 norm bound τ. 1: Initialize all elements in m as 1 2: for i = 1 to E do 3: Initialize as 0 // Phase 1: Inner Optimization 4: for t = 1 to T do 5: Sample a minibatch (x, y) from D with size b 6: Linner = L(f(x + ; m θ), y) 7: = η1 Linner 8: end for 9: Clip : = min(1, τ 1 ) // Phase 2: Outer Optimization 10: for t = 1 to T do 11: Louter = αL(f(x; m θ), y) + βL(f(x + ; m θ), y) + γ m 1 12: m = m + η2 m Louter 13: Clip m to [0, 1]. 14: end for 15: end for Output: Filter masks m for weights in network f.
Open Source Code Yes 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes Datasets and Networks. We conduct experiments on two datasets: CIFAR-10 [26] and GTSRB [20].
Dataset Splits No CIFAR-10 contains 50000 training data and 10000 test data of 10 classes. While train/test counts are given, a separate *validation* split is not detailed for the main experiments, and the 'available data' for defense acts as a training/validation set for the defense itself.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running its experiments (e.g., GPU model, CPU type, memory).
Software Dependencies No The paper mentions 'Pytorch [39]' but does not provide specific version numbers for software dependencies needed for reproducibility.
Experiment Setup Yes We test with α [0.5, 0.8], β = 1 α, γ [10 8, 10 5], τ [10, 3000] and shows the performance changes under the Trojan-SQ attack with 500 training data. When varying the value of one specific hyper-parameter, we fix the others to the default value as α0 = 0.9, γ0 = 10 7, τ0 = 1000.