AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

Authors: Junfeng Guo, Ang Li, Cong Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evidenced by extensive experiments across multiple popular tasks and backdoor attacks, our approach is shown effective in detecting backdoor attacks under the black-box hard-label scenarios.
Researcher Affiliation Academia Junfeng Guo, Ang Li, Cong Liu Department of Computer Science The University of Texas at Dallas {jxg170016,angli,cong}@utdallas.edu
Pseudocode Yes Algorithm 1 Aggregated Global Adversarial Peak (GAP) Algorithm 2 Visualize µ
Open Source Code Yes Code :https://github.com/Junfeng Go/AEVA-Blackbox-Backdoor-Detection-main In addition, we have released the implementation of AEVA in https://github.com/Junfeng Go/ AEVA-Blackbox-Backdoor-Detection-main.
Open Datasets Yes Datasets. We evaluate our approach on CIFAR-10, CIFAR-100, and Tiny-Image Net datasets.
Dataset Splits No The paper mentions using a "hold-out validation set of infected and benign models" for choosing the MAD detector threshold, but it does not provide specific train/validation/test dataset splits (percentages or counts) for the main datasets (CIFAR-10, CIFAR-100, Tiny-Image Net) used for model training and evaluation.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For each task, we use 200 samples for the gradient estimation, and the batch size for each {Xi}n i=1 is set to 40 in Algorithm 1. Around 10% poison data is injected in training. Each model is trained for 200 epochs with data augmentation. We set the threshold value τ = 4 for our MAD detector, which means we identify the class whose corresponding anomaly index larger than 4 as infected.