reproducibilityindex.ai

Increasing Confidence in Adversarial Robustness Evaluations

Authors: Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For eleven out of thirteen previously-published defenses, the original evaluation of the defense fails our test, while stronger attacks that break these defenses pass it. We show that our test would have potentially identified eleven out of thirteen weak evaluations found in peer-reviewed papers.
Researcher Affiliation	Collaboration	Roland S. Zimmermann University of Tübingen Tübingen AI Center Wieland Brendel University of Tübingen Tübingen AI Center Florian Tramèr Google Nicholas Carlini Google
Pseudocode	Yes	Algorithm 1 Binarization Test for Classiﬁers with Linear Classiﬁcation Readouts.
Open Source Code	Yes	Online version & code: zimmerrol.github.io/active-tests/ We included the implementation of our proposed test as well as the code to reproduce the results for the defenses investigated in this work.
Open Datasets	Yes	Since the dataset used in this work is a standard dataset (CIFAR-10) we do not discuss the aforementioned issues.
Dataset Splits	Yes	A detailed overview of the experimental details including the hyperparameters used is given in Section B
Hardware Specification	Yes	A description of the used hardware and total amount of compute is presented in section B.
Software Dependencies	No	The paper does not specify software dependencies with version numbers.
Experiment Setup	Yes	A detailed overview of the experimental details including the hyperparameters used is given in Section B