Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Robustness Distributions in Neural Network Verification
Authors: Annelot Bosman, Aaron Berger, Holger H. Hoos, Jan N. van Rijn
JAIR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We then analyse the distributions of these critical 𝜀values over a given set of inputs for 12 MNIST classifiers widely used in the literature on neural network verification. Using a Kolmogorov-Smirnov test, we obtain support for the hypothesis that the critical 𝜀values of 11 of these networks follow a log-normal distribution. Furthermore, we found no statistically significant differences between the critical 𝜀distributions for training and testing data for 12 feed-forward neural networks on the MNIST dataset. |
| Researcher Affiliation | Academia | ANNELOT W. BOSMAN , Leiden University, The Netherlands AARON BERGER, RWTH Aachen University, Germany HOLGER H. HOOS, RWTH Aachen University, Germany and Leiden University, The Netherlands JAN N. VAN RIJN, Leiden University, The Netherlands |
| Pseudocode | No | The paper describes the 𝑘-binary search algorithm in Section 3.3 "𝑘-binary Search" using natural language and conceptual steps, but does not present a formal, structured pseudocode block or algorithm listing. |
| Open Source Code | Yes | Lastly, we provide a ready-to-use Python package available on Git Hub that can be used for creating robustness distributions and enables others to build upon our work.1 The package is modular, such that any part can be changed, including the instance set under consideration, the robustness property or the verifiers used. This makes our results fully reproducible and will help others build on our work. Furthermore, all our networks and data are available on Git Hub.2 1See: https://github.com/ADA-research/VERONA 2See: https://github.com/ADA-research/NNV_JAIR_robustness_distributions |
| Open Datasets | Yes | We analyse the critical 𝜀distributions for 12 widely studied fully-connected MNIST neural networks... We investigate the effect adversarial training can have on the critical 𝜀distribution of various neural networks for MNIST, CIFAR and GTSRB datasets. |
| Dataset Splits | Yes | Following the work of König et al. [25], we used the first 100 instances from the MNIST training and testing sets, respectively... For both CIFAR-10 and GTSRB, we randomly selected 100 testing and training images each, with random seed 42. Given that the GTSRB dataset contains 42 classes, we performed stratified random selection. |
| Hardware Specification | Yes | All experiments were carried out on a cluster of machines, each equipped with 2 Intel Xeon E5-2683 CPUs with 32 cores, 40MB cache size and 94GB of RAM. |
| Software Dependencies | Yes | We used Python 3.10 with Cent OS 7.0. |
| Experiment Setup | Yes | We ran 𝑘-binary search with 200 𝜀values, ranging from 0.001 to 0.4, in intervals of 0.002, i.e., (0.001, 0.003, . . . , 0.397, 0.399)... The time-out for each of these queries was set to one hour. For MNIST, we used a perturbation of 0.2 and for CIFAR-10 and GTSRB 8/255. For PGD training... For MNIST, we used a perturbation of 0.3, and for CIFAR-10 and GTSRB, we used a perturbation of 8/255... For all training methods, we performed hyperparameter optimisation using Optuna [1]; the final hyperparameter values can be found in Appendix I, Tables 21 and 22. |