Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming
Authors: Sumanth Dathathri, Krishnamurthy Dvijotham, Alexey Kurakin, Aditi Raghunathan, Jonathan Uesato, Rudy R. Bunel, Shreya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy S. Liang, Pushmeet Kohli
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For two verification-agnostic networks on MNIST and CIFAR-10, we significantly improve 8 verified robust accuracy from 1%Ñ88% and 6%Ñ40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder. |
| Researcher Affiliation | Collaboration | 1Deep Mind 2Google Brain 3Stanford 4UC Berkeley 5Work done at Google |
| Pseudocode | Yes | Algorithm 1 Verification via SDP-FO |
| Open Source Code | Yes | : Code available at https://github.com/deepmind/jax_verify. |
| Open Datasets | Yes | For two verification-agnostic networks on MNIST and CIFAR-10, we significantly improve 8 verified robust accuracy from 1%Ñ88% and 6%Ñ40% respectively. |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets, and refers to '500 test set examples' and implies the use of a validation set in 'initial grid search to find a good set of hyperparameters on the validation set'. However, it does not provide explicit details about the specific training/validation/test splits, percentages, or sample counts used for reproduction within the main text or readily accessible appendix sections. |
| Hardware Specification | Yes | Using a P100 GPU, maximum runtime is roughly 15 minutes per MLP instances, and 3 hours per CNN instances, though most instances are verified sooner. |
| Software Dependencies | No | The paper mentions using ML frameworks like Tensor Flow, Py Torch, or JAX, and states that core logic is implemented in JAX. However, it does not provide specific version numbers for these software dependencies (e.g., JAX 0.x.y or PyTorch 1.z). |
| Experiment Setup | Yes | Complete training and hyperparameter details are included in Appendix B.1. All networks are trained for 50 epochs using Adam with a learning rate of 0.001. We use a batch size of 256 for MNIST and 128 for CIFAR-10. |