Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Adversarial Training and Robustness for Multiple Perturbations
Authors: Florian Tramer, Dan Boneh
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical results with empirical evaluations of the robustness trade-off on MNIST and CIFAR10.1 MNIST is an interesting case-study as distinct models achieve strong robustness to different βp and spatial attacks[31, 11]. Despite the dataset s simplicity, we show that no single model achieves strong β , β1 and β2 robustness, and that new techniques are required to close this gap. The code used for all of our experiments can be found here: https://github.com/ftramer/ Multi Robustness |
| Researcher Affiliation | Academia | Florian Tramèr Stanford University Dan Boneh Stanford University |
| Pseudocode | Yes | Algorithm 1: The Sparse β1 Descent Attack (SLIDE). |
| Open Source Code | Yes | The code used for all of our experiments can be found here: https://github.com/ftramer/ Multi Robustness |
| Open Datasets | Yes | We experiment with MNIST and CIFAR10. |
| Dataset Splits | No | The paper mentions training on MNIST and CIFAR10 and evaluating on 1000 test points, but does not provide specific train/validation/test dataset split percentages, sample counts, or clear citations to predefined standard splits for reproducibility. |
| Hardware Specification | No | The paper mentions training models and experiments but does not provide specific details on the hardware used, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper references various attacks and models, and states that 'The code used for all of our experiments can be found here: https://github.com/ftramer/ Multi Robustness', but it does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) within the text. |
| Experiment Setup | Yes | For MNIST, we use β1(Ο΅ = 10), β2(Ο΅ = 2) and β (Ο΅ = 0.3). For CIFAR10 we use β (Ο΅ = 4 255) and β1(Ο΅ = 2000 255 ). We also train on rotation-translation attacks with 3px translations and 30 rotations as in [11]. ... PGD [25] and our SLIDE attack with 100 steps and 40 restarts (20 restarts on CIFAR10). |