Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Clip-and-Verify: Linear Constraint-Driven Domain Clipping for Accelerating Neural Network Verification

Authors: Duo Zhou, Jorge Chavez, Hesun Chen, Grani A. Hanasusanto, Huan Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the effectiveness of Clip-and-Verify on several benchmarks from VNN-COMP 20212024 [6, 49, 11, 10]. We first demonstrate its benefits on three benchmarks and three hard NN control system problems that are commonly solved with the input Ba B procedure, then evaluate our approach on challenging activation space Ba B benchmarks in VNN-COMPs, as well as SDP-FO benchmarks introduced in previous studies [19, 66]. Fig. 2 shows the overview of results of selected benchmarks. All results visualization and details about the configuration see Appendix D.1. Ablation Studies. We compare the Ba B baseline without clipping, Relaxed Clipping only, and the full pipeline (Relaxed + Complete) in Table 1. We further vary the comparison of complete clipping and LP solvers in Section D.3, and scoring rule for choosing critical neurons (e.g., Ba BSR intercept, envelope gap, bound width) in Complete Clipping in Section D.4. For each configuration we report time and visited subproblems under identical branching/timeout settings in Table 4 and Table 6; detailed breakdowns, per-model trends, and ablation protocols are in Appendix D.2.
Researcher Affiliation	Academia	Duo Zhou Jorge Chavez Hesun Chen Grani A. Hanasusanto Huan Zhang University of Illinois Urbana-Champaign Equal Contribution EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Coordinate Ascent for Multiple Constraints Algorithm 2 Linear Constraint-Driven Relaxed Clipping (Parallel) Algorithm 3 Linear Constraint-Driven Relaxed Clipping (sequential) Algorithm 4 Clip-and-Verify for Input Branch-and-bound Algorithm 5 Clip-and-Verify for Activation Branch-and-bound
Open Source Code	Yes	Clip-and-Verify is part of the α,β-CROWN verifier, the VNN-COMP 2025 winner. Code available at https://github.com/Verified-Intelligence/Clip_and_Verify.
Open Datasets	Yes	We evaluate the effectiveness of Clip-and-Verify on several benchmarks from VNN-COMP 20212024 [6, 49, 11, 10]. ... We conduct a detailed ablation study on multiple adversarially-trained models spanning MNIST, CIFAR, and larger VNN-COMP benchmarks
Dataset Splits	No	The paper does not explicitly state the training/test/validation dataset splits (e.g., percentages or absolute counts). It mentions using benchmarks from VNN-COMP and commonly used datasets like MNIST and CIFAR, which typically have standard splits, but it does not detail how the data was partitioned for its specific experiments. For example, it discusses 'per-example verification time' but not how the examples were selected from a split.
Hardware Specification	Yes	To allow for comparability of results, all tools for input Ba B were evaluated on equal-cost hardware with a 32-vcore CPU, one NVIDIA RTX 4090 GPU with 24 GB memory, and 256 GB CPU memory. For Re LU based Ba B experiment, we use a cluster with one AMD EPYC 9534 64-core CPU and the GPU is one NVIDIA RTX 5090 GPU with 32 GB memory and 512 GB CPU memory.
Software Dependencies	Yes	The MIP cuts are acquired by the cplex [34] solver (version 22.1.0.0). We use the Adam optimizer [38] to solve both α, β, µ, τ.
Experiment Setup	Yes	For the SDP-FO benchmarks, we optimize those parameters for 20 iterations with a learning rate of 0.1 for α and 0.02 for β, µ, τ. We decay the learning rates with a factor of 0.98 per iteration. The timeout is 200s per instance. For the VNN-COMP benchmarks, we use the same configuration as α,β-CROWN used in the respective competition and the same timeouts.