reproducibilityindex.ai

Complete Verification via Multi-Neuron Relaxation Guided Branch-and-Bound

Authors: Claudio Ferrari, Mark Niklas Mueller, Nikola Jovanović, Martin Vechev

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	An extensive evaluation demonstrates that our veriﬁer achieves a new stateof-the-art on both established benchmarks as well as networks with signiﬁcantly higher accuracy than previously considered. We now present an extensive evaluation of our method.
Researcher Affiliation	Academia	Claudio Ferrari, Mark Niklas Müller, Nikola Jovanovi c, Martin Vechev Department of Computer Science, ETH Zurich, Switzerland {claudio.ferrari, mark.mueller, nikola.jovanovic, martin.vechev}@inf.ethz.ch
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We release all code and scripts to reproduce our experiments at https://github.com/eth-sri/mn-bab.
Open Datasets	Yes	We benchmark MN-BAB on a wide range of networks (see Table 3 in App. A) on the MNIST (Deng, 2012) and CIFAR10 datasets (Krizhevsky et al., 2009). All datasets used in the experiments are publicly available.
Dataset Splits	No	The paper mentions using a 'test set' for evaluation but does not specify the training or validation splits for the datasets (e.g., exact percentages or sample counts for each split, or explicit reference to standard splits used).
Hardware Specification	Yes	We implement a GPU-based version of MN-BAB in Py Torch (Paszke et al., 2019) and evaluate all benchmarks using a single NVIDIA RTX 2080Ti, 64 GB of memory, and 16 CPU cores.
Software Dependencies	No	The paper mentions 'Py Torch (Paszke et al., 2019)' but does not specify its version number within the text, nor does it list any other software dependencies with explicit version numbers.
Experiment Setup	Yes	Experimental Setup We implement a GPU-based version of MN-BAB in Py Torch (Paszke et al., 2019) and evaluate all benchmarks using a single NVIDIA RTX 2080Ti, 64 GB of memory, and 16 CPU cores. For every subproblem, we ﬁrst lower-bound the objective using DEEPPOLY and then compute reﬁned bounds using the method described in 3. Res Net6-A and Res Net8-A were trained using 8-steps and ϵ = 4/255, whereas 20-steps and ϵ = 8/255 were used for Res Net6-B. In Table 3 we show the per-sample timeout and the batch sizes that were used in the experiments. We apply two targeted versions (towards all classes) of PGD (Madry et al., 2018) using margin loss (Gowal et al., 2018) and GAMA loss (Sriramanan et al., 2020), both with 5 restarts, 50 steps, and 10 step output diversiﬁcation (?).