Towards Certifying L-infinity Robustness using Neural Networks with L-inf-dist Neurons
Authors: Bohang Zhang, Tianle Cai, Zhou Lu, Di He, Liwei Wang
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that using ℓ -dist nets as basic building blocks, we consistently achieve state-of-the-art performance on commonly used datasets: 93.09% certified accuracy on MNIST (ϵ = 0.3), 35.42% on CIFAR-10 (ϵ = 8/255) and 16.31% on Tiny Image Net (ϵ = 1/255). |
| Researcher Affiliation | Collaboration | 1Key Laboratory of Machine Perception, MOE, School of EECS, Peking University 2Department of Electrical and Computer Engineering, Princeton University 3Zhongguancun Haihua Institute for Frontier Information Technology 4Department of Computer Science, Princeton University 5Microsoft Research 6Center for Data Science, Peking University. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Finally, we provide all the implementation details and codes at https://github.com/zbh2047/L inf-dist-net. |
| Open Datasets | Yes | We train our models on four popular benchmark datasets: MNIST, Fashion-MNIST, CIFAR-10 and Tiny Imagenet. |
| Dataset Splits | No | The paper mentions training and testing procedures and some data augmentation strategies, but it does not specify explicit training/validation/test splits by percentage or count. It refers to 'training set' and 'test set' but not a distinct 'validation set' in terms of splitting ratios. |
| Hardware Specification | Yes | All these experiments are run on a single NVIDIA-RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions 'Adam optimizer with hyper-parameters β1 = 0.9, β2 = 0.99 and ϵ = 10 10', but it does not specify any software names with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | In all experiments, we train ℓ -dist Net and ℓ -dist Net+MLP using Adam optimizer with hyper-parameters β1 = 0.9, β2 = 0.99 and ϵ = 10 10. The batch size is set to 512. For data augmentation, we use random crop (padding=1) for MNIST and Fashion-MNIST, and use random crop (padding=4) and random horizontal flip for CIFAR-10, following the common practice. For Tiny Image Net dataset, we use random horizontal flip and crop each image to 56 56 pixels for training, and use a center crop for testing, which is the same as Xu et al. (2020a). As for the loss function, we use multi-class hinge loss for ℓ -dist Net and the IBP loss (Gowal et al., 2018) for ℓ -dist Net+MLP. The training procedure is as follows. First, we relax the ℓ dist net to ℓp-dist net by setting p = 8 and train the network for e1 epochs. Then we gradually increase p from 8 to 1000 exponentially in the next e2 epochs. Finally, we set p = and train the last e3 epochs. Here e1, e2 and e3 are hyperparameters varying from the dataset. We use lr = 0.02 in the first e1 epochs and decease the learning rate using cosine annealing for the next e2 + e3 epochs. We use ℓp-norm weight decay for ℓ -dist nets and ℓ2-norm weight decay for the MLP with coefficient λ = 0.005. All these explicitly specified hyper-parameters are kept fixed across different architectures and datasets. For ℓ -dist Net+MLP training, we use the same linear warmup strategy for hyper-parameter ϵtrain in Gowal et al. (2018); Zhang et al. (2020b). See Appendix D (Table 6) for details of training configuration and hyper-parameters. |