reproducibilityindex.ai

A Statistical Approach to Assessing Neural Network Robustness

Authors: Stefan Webb, Tom Rainforth, Yee Whye Teh, M. Pawan Kumar

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that our approach is able to emulate formal veriﬁcation procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability. We demonstrate that our approach is able to emulate formal veriﬁcation procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability. We demonstrate that our approach is able to emulate formal veriﬁcation procedures on benchmark problems, while scaling to larger networks and providing reliable additional information in the form of accurate estimates of the violation probability. Section 6 'Experiments' details the validation of our approach on several models and datasets from the literature, including COLLISIONDETECTION, MNIST, CIFAR 10, and CIFAR 100, and compares its performance against naive Monte Carlo methods, providing metrics such as 'estimates of I' and 'speed up'.
Researcher Affiliation	Academia	Stefan Webb Department of Engineering Science University of Oxford, Tom Rainforth, Yee Whye Teh Department of Statistics University of Oxford, M. Pawan Kumar Department of Engineering Science University of Oxford, Alan Turing Institute
Pseudocode	Yes	Algorithm 1 Adaptive multi-level splitting with termination criterion
Open Source Code	Yes	Code to reproduce all experimental results is available at https://github.com/oval-group/statistical-robustness.
Open Datasets	Yes	To help elucidate our problem setting, we consider the ACASXU dataset (Katz et al., 2017) from the formal veriﬁcation literature. We used the COLLISIONDETECTION dataset introduced in the formal veriﬁcation literature by (Ehlers, 2017). To validate the algorithm on a higher-dimensional problem, we ﬁrst tested adversarial properties on the MNIST and CIFAR 10 datasets... To demonstrate that our approach can be employed on large networks, we tested adversarial properties on the CIFAR 100 dataset...
Dataset Splits	No	The paper mentions using a 'test set' for experiments on MNIST and CIFAR datasets (e.g., 'ten samples from the test set', '50 samples from the test set'). For the COLLISIONDETECTION dataset, it states '500 properties are speciﬁed for veriﬁcation', but it does not provide specific details on training, validation, or test data splits (e.g., percentages, sample counts for each partition), nor does it explicitly refer to standard splits in a way that allows reproduction of the exact data partitioning.
Hardware Specification	No	The paper mentions issues with 'GPU memory is exhausted' when discussing computational infeasibility for certain experiments with the method of Wong & Kolter (2018). However, it does not specify any particular GPU models, CPU types, or other hardware configurations used for running the authors' own experiments or the specific model numbers.
Software Dependencies	No	The paper does not provide specific software dependencies, libraries, or framework versions (e.g., 'Python 3.8', 'PyTorch 1.9', 'TensorFlow 2.x') used for implementing or running the experiments.
Experiment Setup	Yes	We ran our approach on all 500 properties, setting ρ = 0.1, N = 104, M = 1000... multilevel splitting was run on ten samples from the test set at multiple values of ϵ, with N = 10000 and ρ = 0.1, and M {100, 250, 1000} for MNIST and M {100, 250, 500, 1000, 2000} for CIFAR 10. For CIFAR 100... we set N = 300... The robustiﬁcation phase trains the classiﬁer to be robust in an l ϵ-ball around the inputs, where ϵ is annealed from 0.01 to 0.1 over the ﬁrst 50 epochs.