Complete Verification via Multi-Neuron Relaxation Guided Branch-and-Bound

Authors: Claudio Ferrari, Mark Niklas Mueller, Nikola Jovanović, Martin Vechev

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An extensive evaluation demonstrates that our verifier achieves a new stateof-the-art on both established benchmarks as well as networks with significantly higher accuracy than previously considered. We now present an extensive evaluation of our method.
Researcher Affiliation Academia Claudio Ferrari, Mark Niklas Müller, Nikola Jovanovi c, Martin Vechev Department of Computer Science, ETH Zurich, Switzerland {claudio.ferrari, mark.mueller, nikola.jovanovic, martin.vechev}@inf.ethz.ch
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We release all code and scripts to reproduce our experiments at https://github.com/eth-sri/mn-bab.
Open Datasets Yes We benchmark MN-BAB on a wide range of networks (see Table 3 in App. A) on the MNIST (Deng, 2012) and CIFAR10 datasets (Krizhevsky et al., 2009). All datasets used in the experiments are publicly available.
Dataset Splits No The paper mentions using a 'test set' for evaluation but does not specify the training or validation splits for the datasets (e.g., exact percentages or sample counts for each split, or explicit reference to standard splits used).
Hardware Specification Yes We implement a GPU-based version of MN-BAB in Py Torch (Paszke et al., 2019) and evaluate all benchmarks using a single NVIDIA RTX 2080Ti, 64 GB of memory, and 16 CPU cores.
Software Dependencies No The paper mentions 'Py Torch (Paszke et al., 2019)' but does not specify its version number within the text, nor does it list any other software dependencies with explicit version numbers.
Experiment Setup Yes Experimental Setup We implement a GPU-based version of MN-BAB in Py Torch (Paszke et al., 2019) and evaluate all benchmarks using a single NVIDIA RTX 2080Ti, 64 GB of memory, and 16 CPU cores. For every subproblem, we first lower-bound the objective using DEEPPOLY and then compute refined bounds using the method described in 3. Res Net6-A and Res Net8-A were trained using 8-steps and ϵ = 4/255, whereas 20-steps and ϵ = 8/255 were used for Res Net6-B. In Table 3 we show the per-sample timeout and the batch sizes that were used in the experiments. We apply two targeted versions (towards all classes) of PGD (Madry et al., 2018) using margin loss (Gowal et al., 2018) and GAMA loss (Sriramanan et al., 2020), both with 5 restarts, 50 steps, and 10 step output diversification (?).