Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scalable Neural Network Verification with Branch-and-bound Inferred Cutting Planes
Authors: Duo Zhou, Christopher Brix, Grani A. Hanasusanto, Huan Zhang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results demonstrate that BICCOS can generate hundreds of useful cuts during the branch-and-bound process and consistently increase the number of verifiable instances compared to other state-of-the-art neural network verifiers on a wide range of benchmarks, including large networks that previous cutting plane methods could not scale to. BICCOS is part of the α,β-CROWN verifier, the VNN-COMP 2024 winner. The code is available at https://github.com/Lemutisme/BICCOS. and 4 Experiments We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41]. |
| Researcher Affiliation | Academia | 1University of Illinois Urbana-Champaign 2RWTH Aachen University |
| Pseudocode | Yes | Algorithm 1 Constraint Strengthening and Algorithm 2 Branch-and-bound Inferred Cuts with Constraint Strengthening (BICCOS). |
| Open Source Code | Yes | The code is available at https://github.com/Lemutisme/BICCOS. |
| Open Datasets | Yes | We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41]. and On both MNIST and CIFAR dataset, BICCOS not only surpasses the performance of methods like β-CROWN and GCP-CROWN on the CNN-A-Adv model but also approaches the empirical robust accuracy upper bound, leaving only a marginal gap. |
| Dataset Splits | No | We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41]. (Explanation: The paper mentions using VNN-COMP and SDP-FO benchmarks, which are typically well-defined, but it does not explicitly provide the specific training/validation/test dataset split percentages, sample counts, or direct citations for how these splits are defined within the paper itself.) |
| Hardware Specification | Yes | Our experiments are conducted on a server with an Intel Xeon 8468 Sapphire CPU, one NVIDIA H100 GPU (96 GB GPU memory), and 480 GB CPU memory. |
| Software Dependencies | Yes | Our implementation is based on the open-source α,β-CROWN verifier1 with cutting plane related code added. and The MIP cuts are acquired by the cplex [31] solver (version 22.1.0.0). |
| Experiment Setup | Yes | For the SDP-FO benchmarks, we optimize those parameters for 20 iterations with a learning rate of 0.1 for α and 0.02 for β, µ, τ. We decay the learning rates with a factor of 0.98 per iteration. The timeout is 200s per instance. For the VNN-COMP benchmarks, we use the same configuration as α,β-CROWN used in the respective competition and the same timeouts. During constraint strengthening, we set drop_percentage = 50%. We only perform one round of strengthening, with no recursive strengthening attempts. We perform constraint strengthening for the first 40 Ba B iterations. For multi-tree search, we perform 5 branching iterations, where we each time pick the current best 50 domains and split them to generate 400 new sub-domains. |