Scalable Neural Network Verification with Branch-and-bound Inferred Cutting Planes
Authors: Duo Zhou, Christopher Brix, Grani A. Hanasusanto, Huan Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our results demonstrate that BICCOS can generate hundreds of useful cuts during the branch-and-bound process and consistently increase the number of verifiable instances compared to other state-of-the-art neural network verifiers on a wide range of benchmarks, including large networks that previous cutting plane methods could not scale to. BICCOS is part of the α,β-CROWN verifier, the VNN-COMP 2024 winner. The code is available at https://github.com/Lemutisme/BICCOS. and 4 Experiments We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41]. |
| Researcher Affiliation | Academia | 1University of Illinois Urbana-Champaign 2RWTH Aachen University |
| Pseudocode | Yes | Algorithm 1 Constraint Strengthening and Algorithm 2 Branch-and-bound Inferred Cuts with Constraint Strengthening (BICCOS). |
| Open Source Code | Yes | The code is available at https://github.com/Lemutisme/BICCOS. |
| Open Datasets | Yes | We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41]. and On both MNIST and CIFAR dataset, BICCOS not only surpasses the performance of methods like β-CROWN and GCP-CROWN on the CNN-A-Adv model but also approaches the empirical robust accuracy upper bound, leaving only a marginal gap. |
| Dataset Splits | No | We evaluate our verifier, BICCOS, on several popular verification benchmarks from VNN-COMP [4, 40, 10] and on the SDP-FO benchmarks used in multiple studies [15, 54, 41]. (Explanation: The paper mentions using VNN-COMP and SDP-FO benchmarks, which are typically well-defined, but it does not explicitly provide the specific training/validation/test dataset split percentages, sample counts, or direct citations for how these splits are defined within the paper itself.) |
| Hardware Specification | Yes | Our experiments are conducted on a server with an Intel Xeon 8468 Sapphire CPU, one NVIDIA H100 GPU (96 GB GPU memory), and 480 GB CPU memory. |
| Software Dependencies | Yes | Our implementation is based on the open-source α,β-CROWN verifier1 with cutting plane related code added. and The MIP cuts are acquired by the cplex [31] solver (version 22.1.0.0). |
| Experiment Setup | Yes | For the SDP-FO benchmarks, we optimize those parameters for 20 iterations with a learning rate of 0.1 for α and 0.02 for β, µ, τ. We decay the learning rates with a factor of 0.98 per iteration. The timeout is 200s per instance. For the VNN-COMP benchmarks, we use the same configuration as α,β-CROWN used in the respective competition and the same timeouts. During constraint strengthening, we set drop_percentage = 50%. We only perform one round of strengthening, with no recursive strengthening attempts. We perform constraint strengthening for the first 40 Ba B iterations. For multi-tree search, we perform 5 branching iterations, where we each time pick the current best 50 domains and split them to generate 400 new sub-domains. |