Robustness between the worst and average case
Authors: Leslie Rice, Anna Bair, Huan Zhang, J. Zico Kolter
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our approach provides substantially better estimates than simple random sampling of the actual intermediate-q robustness of standard, data-augmented, and adversarially-trained classifiers, illustrating a clear tradeoff between classifiers that optimize different metrics. |
| Researcher Affiliation | Collaboration | Leslie Rice Department of Computer Science Carnegie Mellon University Pittsburgh, PA larice@cs.cmu.edu Anna Bair Department of Machine Learning Carnegie Mellon University Pittsburgh, PA abair@cmu.edu Huan Zhang Department of Computer Science Carnegie Mellon University Pittsburgh, PA huan@huan-zhang.com J. Zico Kolter Department of Computer Science Carnegie Mellon University & Bosch Center for Artificial Intelligence Pittsburgh, PA zkolter@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Evaluating the intermediate-q robustness of a neural network function h using path sampling estimation with m MCMC samples with x, y D for some norm q. |
| Open Source Code | Yes | Code for reproducing experiments can be found at https://github.com/locuslab/intermediate_robustness. |
| Open Datasets | Yes | All of our experiments are either run on the MNIST dataset [Le Cun et al., 1998] or the CIFAR-10 dataset [Krizhevsky et al., 2009]. |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets for experiments but does not provide specific details on training, validation, and test splits (e.g., percentages, sample counts, or explicit mention of a validation set for hyperparameter tuning). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers, such as Python version, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used in the implementation. |
| Experiment Setup | Yes | On MNIST, ˆZMC is computed with m = 2000, ˆZPS+HMC with m = 100, L = 20, and Adv. loss corresponds to PGD with 100 iterations. On CIFAR-10, ˆZMC is computed with m = 500, ˆZPS+HMC with m = 50, L = 10, and Adv. loss corresponds to PGD with 50 iterations at 10 restarts. For the MC estimate computed during training, we use m = 50 samples, whereas for the PS+HMC estimate we use m = 25 samples with L = 2 leapfrog steps. |