Understanding the Robustness of Randomized Feature Defense Against Query-Based Adversarial Attacks

Authors: Nguyen Hung-Quang, Yingjie Lao, Tung Pham, Kok-Seng Wong, Khoa D Doan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the empirical performance of the proposed randomized feature defense.
Researcher Affiliation Collaboration Quang H. Nguyen1, Yingjie Lao2, Tung Pham3, Kok-Seng Wong1, Khoa D. Doan1 1College of Engineering and Computer Science, Vin University, Vietnam 2Tufts University 3Vin AI Research
Pseudocode Yes Algorithm 1 Randomized Feature Defense Input: a model f, input data x, noise statistics Σ, a set of perturbed layers H = {hl0, hl1, . . . , hln} Output: logit vector l z0 x for layer hi in the model do if hi H then δ N(0, Σ) zi hi(zi 1) + δ end if end for l zn
Open Source Code Yes Code is available at https://github.com/mail-research/randomized_defenses
Open Datasets Yes Datasets. We perform our experiments on two widely used benchmark datasets in adversarial robustness: CIFAR10 Krizhevsky & Hinton (2009) and Image Net Russakovsky et al. (2015).
Dataset Splits Yes Image Net (ILSVRC) 2012 is a large-scale dataset that consists of 1000 classes. The training set includes 1, 281, 167 images, the validation set includes 50, 000 images, and the test set has 100, 000 images.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions using the 'timm package' for pretrained weights but does not specify its version number or any other software dependencies with their specific versions.
Experiment Setup Yes The detailed hyperparameters of each attack are as follows: Square attack: The initial probability of pixel change is 0.05 for ℓ attack and 0.1 for ℓ2 attack. NES: We estimate the gradient by finite difference with 60 samples for ℓ attack and 30 for ℓ2 attack. The step size of finite difference is 0.01 and 0.005, and the learning rate is set to 0.005 and 1 for ℓ and ℓ2 attack, respectively.