reproducibilityindex.ai

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks

Authors: Sihui Dai, Saeed Mahloujifar, Chong Xiang, Vikash Sehwag, Pin-Yu Chen, Prateek Mittal

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using our framework, we present the first leaderboard, Multi Robust Bench (https://multirobustbench.github.io), for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including ℓp-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total).
Researcher Affiliation	Collaboration	Sihui Dai 1 Saeed Mahloujifar 1 Chong Xiang 1 Vikash Sehwag 1 Pin-Yu Chen 2 Prateek Mittal 1 1Electrical and Computer Engineering, Princeton University 2IBM Research.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper refers to a leaderboard website (https://multirobustbench.github.io) for benchmarking and results, but does not explicitly state that the source code for their methodology is available at this link or elsewhere.
Open Datasets	Yes	We provide 2 leaderboards for the CIFAR-10 dataset
Dataset Splits	No	The paper mentions using the "test set" to evaluate the model saved at the epoch which achieves the highest robust accuracy, implying its use for model selection (a form of validation). However, it does not explicitly define a separate "validation split" of the dataset.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or memory specifications used for running experiments.
Software Dependencies	No	The paper mentions using specific software packages like Auto Attack and PGD for adversarial example generation, but does not specify their version numbers or other software dependencies with versions.
Experiment Setup	Yes	We train all models with batch size of 256 for 100 epochs and evaluate the model saved at the epoch which achieves highest robust accuracy on the test set. We train models using SGD with initial learning rate of 0.1. Learning rate drops to 0.01 after half of the training epochs and drops to 0.001 after 3/4 of the training epochs. For all threat models... we use 20 iterations to find adversarial examples with step size ϵ/18.