A Statistical Approach to Assessing Neural Network Robustness
Authors: Stefan Webb, Tom Rainforth, Yee Whye Teh, M. Pawan Kumar
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that our approach is able to emulate formal verification procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability. We demonstrate that our approach is able to emulate formal verification procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability. We demonstrate that our approach is able to emulate formal verification procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability. Section 6 'Experiments' details the validation of our approach on several models and datasets from the literature, including COLLISIONDETECTION, MNIST, CIFAR 10, and CIFAR 100, and compares its performance against naive Monte Carlo methods, providing metrics such as 'estimates of I' and 'speed up'. |
| Researcher Affiliation | Academia | Stefan Webb Department of Engineering Science University of Oxford, Tom Rainforth, Yee Whye Teh Department of Statistics University of Oxford, M. Pawan Kumar Department of Engineering Science University of Oxford, Alan Turing Institute |
| Pseudocode | Yes | Algorithm 1 Adaptive multi-level splitting with termination criterion |
| Open Source Code | Yes | Code to reproduce all experimental results is available at https://github.com/oval-group/statistical-robustness. |
| Open Datasets | Yes | To help elucidate our problem setting, we consider the ACASXU dataset (Katz et al., 2017) from the formal verification literature. We used the COLLISIONDETECTION dataset introduced in the formal verification literature by (Ehlers, 2017). To validate the algorithm on a higher-dimensional problem, we first tested adversarial properties on the MNIST and CIFAR 10 datasets... To demonstrate that our approach can be employed on large networks, we tested adversarial properties on the CIFAR 100 dataset... |
| Dataset Splits | No | The paper mentions using a 'test set' for experiments on MNIST and CIFAR datasets (e.g., 'ten samples from the test set', '50 samples from the test set'). For the COLLISIONDETECTION dataset, it states '500 properties are specified for verification', but it does not provide specific details on training, validation, or test data splits (e.g., percentages, sample counts for each partition), nor does it explicitly refer to standard splits in a way that allows reproduction of the exact data partitioning. |
| Hardware Specification | No | The paper mentions issues with 'GPU memory is exhausted' when discussing computational infeasibility for certain experiments with the method of Wong & Kolter (2018). However, it does not specify any particular GPU models, CPU types, or other hardware configurations used for running the authors' own experiments or the specific model numbers. |
| Software Dependencies | No | The paper does not provide specific software dependencies, libraries, or framework versions (e.g., 'Python 3.8', 'PyTorch 1.9', 'TensorFlow 2.x') used for implementing or running the experiments. |
| Experiment Setup | Yes | We ran our approach on all 500 properties, setting ρ = 0.1, N = 104, M = 1000... multilevel splitting was run on ten samples from the test set at multiple values of ϵ, with N = 10000 and ρ = 0.1, and M {100, 250, 1000} for MNIST and M {100, 250, 500, 1000, 2000} for CIFAR 10. For CIFAR 100... we set N = 300... The robustification phase trains the classifier to be robust in an l ϵ-ball around the inputs, where ϵ is annealed from 0.01 to 0.1 over the first 50 epochs. |