Robustness between the worst and average case

Authors: Leslie Rice, Anna Bair, Huan Zhang, J. Zico Kolter

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our approach provides substantially better estimates than simple random sampling of the actual intermediate-q robustness of standard, data-augmented, and adversarially-trained classifiers, illustrating a clear tradeoff between classifiers that optimize different metrics.
Researcher Affiliation Collaboration Leslie Rice Department of Computer Science Carnegie Mellon University Pittsburgh, PA larice@cs.cmu.edu Anna Bair Department of Machine Learning Carnegie Mellon University Pittsburgh, PA abair@cmu.edu Huan Zhang Department of Computer Science Carnegie Mellon University Pittsburgh, PA huan@huan-zhang.com J. Zico Kolter Department of Computer Science Carnegie Mellon University & Bosch Center for Artificial Intelligence Pittsburgh, PA zkolter@cs.cmu.edu
Pseudocode Yes Algorithm 1 Evaluating the intermediate-q robustness of a neural network function h using path sampling estimation with m MCMC samples with x, y D for some norm q.
Open Source Code Yes Code for reproducing experiments can be found at https://github.com/locuslab/intermediate_robustness.
Open Datasets Yes All of our experiments are either run on the MNIST dataset [Le Cun et al., 1998] or the CIFAR-10 dataset [Krizhevsky et al., 2009].
Dataset Splits No The paper mentions using MNIST and CIFAR-10 datasets for experiments but does not provide specific details on training, validation, and test splits (e.g., percentages, sample counts, or explicit mention of a validation set for hyperparameter tuning).
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations.
Software Dependencies No The paper does not list specific software dependencies with version numbers, such as Python version, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used in the implementation.
Experiment Setup Yes On MNIST, ˆZMC is computed with m = 2000, ˆZPS+HMC with m = 100, L = 20, and Adv. loss corresponds to PGD with 100 iterations. On CIFAR-10, ˆZMC is computed with m = 500, ˆZPS+HMC with m = 50, L = 10, and Adv. loss corresponds to PGD with 50 iterations at 10 restarts. For the MC estimate computed during training, we use m = 50 samples, whereas for the PS+HMC estimate we use m = 25 samples with L = 2 leapfrog steps.