MultiRobustBench: Benchmarking Robustness Against Multiple Attacks

Authors: Sihui Dai, Saeed Mahloujifar, Chong Xiang, Vikash Sehwag, Pin-Yu Chen, Prateek Mittal

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using our framework, we present the first leaderboard, Multi Robust Bench (https://multirobustbench.github.io), for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including ℓp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total).
Researcher Affiliation Collaboration Sihui Dai 1 Saeed Mahloujifar 1 Chong Xiang 1 Vikash Sehwag 1 Pin-Yu Chen 2 Prateek Mittal 1 1Electrical and Computer Engineering, Princeton University 2IBM Research.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper refers to a leaderboard website (https://multirobustbench.github.io) for benchmarking and results, but does not explicitly state that the source code for their methodology is available at this link or elsewhere.
Open Datasets Yes We provide 2 leaderboards for the CIFAR-10 dataset
Dataset Splits No The paper mentions using the "test set" to evaluate the model saved at the epoch which achieves the highest robust accuracy, implying its use for model selection (a form of validation). However, it does not explicitly define a separate "validation split" of the dataset.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for running experiments.
Software Dependencies No The paper mentions using specific software packages like Auto Attack and PGD for adversarial example generation, but does not specify their version numbers or other software dependencies with versions.
Experiment Setup Yes We train all models with batch size of 256 for 100 epochs and evaluate the model saved at the epoch which achieves highest robust accuracy on the test set. We train models using SGD with initial learning rate of 0.1. Learning rate drops to 0.01 after half of the training epochs and drops to 0.001 after 3/4 of the training epochs. For all threat models... we use 20 iterations to find adversarial examples with step size ϵ/18.