Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

General Cutting Planes for Bound-Propagation-Based Neural Network Verification

Authors: Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, J. Zico Kolter

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that our method is the first verifier that can completely solve the oval20 benchmark and verify twice as many instances on the oval21 benchmark compared to the best tool in VNN-COMP 2021, and also noticeably outperforms state-of-the-art verifiers on a wide range of benchmarks.
Researcher Affiliation	Collaboration	1CMU 2Columbia University 3Drexel University 4UIUC 5UCLA 6Bosch Center for AI
Pseudocode	No	The paper does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code is available at http://Paper Code.cc/GCP-CROWN.
Open Datasets	Yes	Results on the oval20 benchmark in VNN-COMP 2020. ... Results on VNN-COMP 2021 benchmarks: oval21 and cifar10-resnet. ... Results on SDP-FO benchmarks.
Dataset Splits	Yes	For the oval20 benchmark, we use the same setup as in [59]. For the VNN-COMP 2021 benchmarks (oval21 and cifar10-resnet), we use the default parameters set for VNN-COMP. For SDP-FO benchmark, we use the default parameters from [57].
Hardware Specification	Yes	All experiments are run on a single machine with 64 CPU cores, 128GB memory, and one NVIDIA GeForce RTX 3090 GPU.
Software Dependencies	No	The paper mentions software like Python, PyTorch, cplex, and gurobi, but does not specify their version numbers.
Experiment Setup	Yes	We use the same branch and bound algorithm as in β-CROWN and we use filtered smart branching (FSB) [16] as the branching heuristic in all experiments. ... When the number of Ba B subdomains are greater than batch size, we rank the subdomains by their lower bounds and choose the easiest domains with largest lower bounds first to verify with GCP-CROWN... The total runtime for a benchmark is 3600 seconds as specified by VNN-COMP 2020.