Scalable Neural Network Verification with Branch-and-bound Inferred Cutting Planes

Authors: Duo Zhou, Christopher Brix, Grani A. Hanasusanto, Huan Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results demonstrate that BICCOS can generate hundreds of useful cuts during the branch-and-bound process and consistently increase the number of verifiable instances compared to other state-of-the-art neural network verifiers on a wide range of benchmarks, including large networks that previous cutting plane methods could not scale to. BICCOS is part of the α,β-CROWN verifier, the VNN-COMP 2024 winner. The code is available at https://github.com/Lemutisme/BICCOS. and 4 Experiments We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41].
Researcher Affiliation Academia 1University of Illinois Urbana-Champaign 2RWTH Aachen University
Pseudocode Yes Algorithm 1 Constraint Strengthening and Algorithm 2 Branch-and-bound Inferred Cuts with Constraint Strengthening (BICCOS).
Open Source Code Yes The code is available at https://github.com/Lemutisme/BICCOS.
Open Datasets Yes We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41]. and On both MNIST and CIFAR dataset, BICCOS not only surpasses the performance of methods like β-CROWN and GCP-CROWN on the CNN-A-Adv model but also approaches the empirical robust accuracy upper bound, leaving only a marginal gap.
Dataset Splits No We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41]. (Explanation: The paper mentions using VNN-COMP and SDP-FO benchmarks, which are typically well-defined, but it does not explicitly provide the specific training/validation/test dataset split percentages, sample counts, or direct citations for how these splits are defined within the paper itself.)
Hardware Specification Yes Our experiments are conducted on a server with an Intel Xeon 8468 Sapphire CPU, one NVIDIA H100 GPU (96 GB GPU memory), and 480 GB CPU memory.
Software Dependencies Yes Our implementation is based on the open-source α,β-CROWN verifier1 with cutting plane related code added. and The MIP cuts are acquired by the cplex [31] solver (version 22.1.0.0).
Experiment Setup Yes For the SDP-FO benchmarks, we optimize those parameters for 20 iterations with a learning rate of 0.1 for α and 0.02 for β, µ, τ. We decay the learning rates with a factor of 0.98 per iteration. The timeout is 200s per instance. For the VNN-COMP benchmarks, we use the same configuration as α,β-CROWN used in the respective competition and the same timeouts. During constraint strengthening, we set drop_percentage = 50%. We only perform one round of strengthening, with no recursive strengthening attempts. We perform constraint strengthening for the first 40 Ba B iterations. For multi-tree search, we perform 5 branching iterations, where we each time pick the current best 50 domains and split them to generate 400 new sub-domains.