Efficient Certification of Spatial Robustness

Authors: Anian Ruoss, Maximilian Baader, Mislav Balunović, Martin Vechev2504-2513

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on various network architectures and different datasets demonstrate the effectiveness and scalability of our method. We now investigate the precision and scalability of our certification method by evaluating it on a rich combination of datasets and network architectures.
Researcher Affiliation Academia Anian Ruoss, Maximilian Baader, Mislav Balunovi c, Martin Vechev Department of Computer Science ETH Zurich anruoss@eth.ch {mbaader, mislav.balunovic, martin.vechev}@inf.eth.ch
Pseudocode No The paper describes its methods primarily through prose and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We make our code publicly available as part of the ERAN framework for neural network verification (available at https://github.com/eth-sri/eran).
Open Datasets Yes We select a random subset of 100 images from the MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) test datasets on which we run all experiments.
Dataset Splits No We select a random subset of 100 images from the MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) test datasets on which we run all experiments. While the paper uses established test datasets, it does not explicitly provide details about the train/validation/test splits used for their models, beyond stating they used a subset of test images for evaluation.
Hardware Specification Yes We use a desktop PC with a single Ge Force RTX 2080 Ti GPU and a 16-core Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz...
Software Dependencies No The paper mentions software components such as Deep Poly, k-ReLU, MILP, and Re LU activations, but it does not specify their version numbers or other software dependencies with concrete version information.
Experiment Setup Yes We select a random subset of 100 images from the MNIST (Le Cun, Cortes, and Burges 2010) and CIFAR-10 (Krizhevsky 2009) test datasets on which we run all experiments. We consider adversarially trained variants of the CONVSMALL, CONVMED, and CONVBIG architectures proposed by Mirman, Gehr, and Vechev (2018), using PGD (Madry et al. 2018) and Diff AI (Mirman, Gehr, and Vechev 2018) for adversarial training. For CIFAR-10, we also consider a Res Net (He et al. 2016), with 4 residual blocks of 16, 16, 32, and 64 filters each, trained with the provable defense from Wong et al. (2018). We present the model accuracies and training hyperparameters in Appendix B. We limit MILP to 5 minutes...