Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming

Authors: Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, Aditi Raghunathan, Jonathan Uesato, Rudy R. Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy S. Liang, Pushmeet Kohli

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For two verification-agnostic networks on MNIST and CIFAR-10, we significantly improve 8 verified robust accuracy from 1%Ñ88% and 6%Ñ40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
Researcher Affiliation Collaboration 1Deep Mind 2Google Brain 3Stanford 4UC Berkeley 5Work done at Google
Pseudocode Yes Algorithm 1 Verification via SDP-FO
Open Source Code Yes : Code available at https://github.com/deepmind/jax_verify.
Open Datasets Yes For two verification-agnostic networks on MNIST and CIFAR-10, we significantly improve 8 verified robust accuracy from 1%Ñ88% and 6%Ñ40% respectively.
Dataset Splits No The paper mentions using MNIST and CIFAR-10 datasets, and refers to '500 test set examples' and implies the use of a validation set in 'initial grid search to find a good set of hyperparameters on the validation set'. However, it does not provide explicit details about the specific training/validation/test splits, percentages, or sample counts used for reproduction within the main text or readily accessible appendix sections.
Hardware Specification Yes Using a P100 GPU, maximum runtime is roughly 15 minutes per MLP instances, and 3 hours per CNN instances, though most instances are verified sooner.
Software Dependencies No The paper mentions using ML frameworks like Tensor Flow, Py Torch, or JAX, and states that core logic is implemented in JAX. However, it does not provide specific version numbers for these software dependencies (e.g., JAX 0.x.y or PyTorch 1.z).
Experiment Setup Yes Complete training and hyperparameter details are included in Appendix B.1. All networks are trained for 50 epochs using Adam with a learning rate of 0.001. We use a batch size of 256 for MNIST and 128 for CIFAR-10.