Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Formal Safety Analysis of Neural Networks

Authors: Shiqi Wang, Kexin Pei, Justin Whitehouse, Junfeng Yang, Suman Jana

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that on average Neurify is 5, 000 faster than Reluplex [17] and 20 than Relu Val [39].
Researcher Affiliation	Academia	Columbia University, NYC, NY 10027, USA
Pseudocode	No	The paper describes its methods in text and diagrams (Figure 2 illustrates the high-level workﬂow of Neurify) but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions implementing Neurify and references third-party codebases like Nvidia-Autopilot-Keras and Udacity's self-driving car challenge, but does not provide a specific link or explicit statement about the open-sourcing of Neurify's code.
Open Datasets	Yes	We use the dataset from Udacity self-driving car challenge containing 101,396 training and 5,614 testing samples [4]. Our model s architecture is similar to the DAVE-2 self-driving car architecture from NVIDIA [6, 2] and it achieves similar 1-MSE as models used in [29].
Dataset Splits	No	The paper mentions '101,396 training and 5,614 testing samples' for the Udacity dataset, but does not specify a separate validation split or explicit proportions for validation data.
Hardware Specification	No	All our evaluations were performed on a Linux server running Ubuntu 16.04 with 8 CPU cores and 256GB memory.
Software Dependencies	Yes	We use the highly optimized Open BLAS1 library for matrix multiplications and lp_solve 5.52 for solving the linear constraints generated during the directed constraint reﬁnement process. We further use Gurobi 8.0.0 solver for L2-bounded safety properties.
Experiment Setup	Yes	We use Neurify to check six different types of safety properties of nine different networks trained on ﬁve different datasets. ... We formally analyze the network with inputs bounded by L , L1, brightness, and contrast as described in Section 3.3. We deﬁne the safe range of deviation of the output steering direction from the original steering angle to be less than 30 degrees. The total number of cases Neurify can verify are shown in Table 2. ... with timeout threshold set to 3,600 seconds.