Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Neuron Similarity-Based Neural Network Verification via Abstraction and Refinement

Authors: Yuehao Liu, Yansong Dong, Liang Zhao, Wensheng Wang, Cong Tian

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that ARVerifier significantly reduces network size and yields verification time reductions by 11.61%, 18.70%, and 12.20% compared to α, β-CROWN, Verinet, and Marabou, respectively.
Researcher Affiliation	Collaboration	1School of Computer Science and Technology, Xidian University 2Beijing Sunwise Information Technology Ltd EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 specifies the general neural network verification method based on CEGAR. Given a verification problem N, P, Q , the algorithm returns SAFE if the safety property is satisfied, or UNSAFE along with a counterexample cex.
Open Source Code	No	The paper states, "We have implemented this method as a tool named ARVerifier and integrated it with three state-of-the-art verification tools for evaluation on ACAS Xu and MNIST benchmarks." While it mentions implementation as a tool, there is no explicit statement about making the source code publicly available or providing a link to a repository.
Open Datasets	Yes	The neural networks considered in the experiments are from the fully connected benchmarks used in the international verification of neural networks competition (VNN-COMP) [Brix et al., 2024; Brix et al., 2023], specifically the ACAS Xu and MNIST datasets.
Dataset Splits	No	The paper describes how properties were tested on the ACAS Xu and MNIST datasets (e.g., "testing 4 properties across all 45 networks and 6 on a single network" for ACAS Xu, and "testing 25 images with l perturbations of 0.02 and 0.05" for MNIST). However, it does not specify any training/test/validation splits for the datasets themselves, as the paper focuses on verification of pre-existing DNNs, not their training.
Hardware Specification	Yes	Experiments are performed on a 64-bit Ubuntu 18.04 platform equipped with 64 GB of RAM and an Intel i77700 quad-core processor. ... α, β-CROWN is executed with an additional NVIDIA TITAN RTX GPU with 24 GB memory.
Software Dependencies	Yes	All verification tools are implemented in Python 3.8. For the tools involving MILP solving, α, β-CROWN and Marabou use Gurobi 9.1, while Verinet uses Xpress 9.0 as the backend solver.
Experiment Setup	Yes	Besides, we impose a 30-minute timeout for each verification problem, while maintaining default values for other parameters.