UNICORN: A Unified Backdoor Trigger Inversion Framework

Authors: Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our prototype UNICORN is general and effective in inverting backdoor triggers in DNNs. The code can be found at https://github.com/ RU-System-Software-and-Security/UNICORN. ... Based on the devised optimization problem, we implemented a prototype UNICORN (Unified Backdoor Trigger Inversion) in Py Torch and evaluated it on nine different models and eight different backdoor attacks (i.e., Patch attack (Gu et al., 2017), Blend attack (Chen et al., 2017), SIG (Barni et al., 2019), moon filter, kelvin filter, 1977 filter (Liu et al., 2019), Wa Net (Nguyen & Tran, 2021) and Bpp Attack (Wang et al., 2022c)) on CIFAR-10 and Image Net dataset. Results show UNICORN is effective for inverting various types of backdoor triggers. On average, the attack success rate of the inverted triggers is 95.60%, outperforming existing trigger inversion methods.
Researcher Affiliation Academia Zhenting Wang, Kai Mei, Juan Zhai, Shiqing Ma Department of Computer Science, Rutgers University {zhenting.wang,kai.mei,juan.zhai,sm2283}@rutgers.edu
Pseudocode No The paper formulates an optimization problem (Eq. 4 and 5) and describes its implementation, but it does not include a separately labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes The code can be found at https://github.com/ RU-System-Software-and-Security/UNICORN.
Open Datasets Yes Two publicly available datasets (i.e., CIFAR-10 (Krizhevsky et al., 2009) and Image Net (Russakovsky et al., 2015)) are used to evaluate the effectiveness of UNICORN. The details of the datasets can be found in the A.4. ... CIFAR10 (Krizhevsky et al., 2009). This dataset is built for recognizing general objects such as dogs, cars, and planes. It contains 50000 training samples and 10000 training samples in 10 classes. Image Net (Russakovsky et al., 2015). This dataset is a large-scale object classification benchmark. In this paper, we use a subset of the original Image Net dataset specified in Li et al. (Li et al., 2021d). The subset has 100000 training samples and 10000 test samples in 200 classes.
Dataset Splits No The paper specifies training and test sample counts for CIFAR-10 and ImageNet datasets (e.g., '50000 training samples and 10000 training samples' for CIFAR-10, likely a typo for test samples; '100000 training samples and 10000 test samples' for ImageNet). However, it does not explicitly provide a distinct 'validation' dataset split with specific numbers or percentages.
Hardware Specification Yes We conduct all experiments on a Ubuntu 20.04 server equipped with 64 CPUs and six Quadro RTX 6000 GPUs.
Software Dependencies Yes Our method is implemented with python 3.8 and Py Torch 1.11.
Experiment Setup Yes By default, we set α = 0.01, β as 10% of the input space, γ = 0.85, and δ = 0.5.