reproducibilityindex.ai

Provable Defense Against Geometric Transformations

Authors: Rem Yang, Jacob Laurel, Sasa Misailovic, Gagandeep Singh

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate our method on the MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky, 2009), Tiny Image Net (Le & Yang, 2015), and Udacity self-driving car (Udacity, 2016) datasets to demonstrate CGT s effectiveness. Our results show that CGT-trained networks consistently achieve state-of-the-art clean accuracy and certified robustness; furthermore, FGV is between 60 to 42,600 faster than the state-of-the-art verifier for certifying each image. We also achieve several breakthroughs: (1) FGV enables us to certify deterministic robustness against geometric transformations on entire test sets of 10,000 images, which is more than 50 the number of images over existing works (100 in Balunovic et al. (2019) and 200 in Mohapatra et al. (2020)); (2) we are the first to scale deterministic geometric verification beyond CIFAR10; and (3) we are the first to verify a neural network for autonomous driving under realistic geometric perturbations.
Researcher Affiliation	Collaboration	Rem Yang1, Jacob Laurel1, Sasa Misailovic1, Gagandeep Singh1,2 1University of Illinois Urbana-Champaign, 2VMware Research {remyang2,jlaurel2,misailo,ggnds}@illinois.edu
Pseudocode	Yes	Algorithm 1 presents the pseudocode.
Open Source Code	Yes	Our code is publicly available at https://github.com/uiuc-arc/CGT.
Open Datasets	Yes	We empirically evaluate our method on the MNIST (Le Cun et al., 1998), CIFAR10 (Krizhevsky, 2009), Tiny Image Net (Le & Yang, 2015), and Udacity self-driving car (Udacity, 2016) datasets to demonstrate CGT s effectiveness.
Dataset Splits	Yes	We perform a 80-20 train-validation split of the train set, and use CGT to train a network to completion.
Hardware Specification	Yes	We trained and certified all networks (except Wide Res Net) on a machine with a 2.40GHz 24-core Intel Xeon Silver 4214R CPU with 192GB of main memory and one Nvidia A100 GPU with 40GB of memory. All baseline results were also run on the same hardware for fair comparisons. For Wide Res Net, we used the same CPU with four A100 GPUs.
Software Dependencies	No	The paper states, 'We implemented CGT atop Py Torch (Paszke et al., 2019) and use auto_Li RPA (Xu et al., 2020)', but it does not specify version numbers for PyTorch, auto_LiRPA, or any other software dependencies.
Experiment Setup	Yes	We train the MNIST networks for 100 epochs with batch size 256, CIFAR10 networks for 120 epochs with batch size 128, Tiny Image Net networks for 160 epochs with batch size 128 (CNN7) or 400 (Wide Res Net), and the self-driving network for 50 epochs with batch size 128. For the classifiers, we first train with only the cross-entropy loss during a warm-up period; we warm up for 15 epochs on MNIST and 30 epochs on CIFAR10 and Tiny Image Net. For the self-driving network, we directly use Eq. 9 from the start. In order to ensure convergence for the loss, we linearly decay κ from 1 to κf = 0.5 and employ a linear ramp-up schedule to slowly increase the value of ν from 0 up to a final parameter size of νf; we ramp up across 50, 60, 80, and 50 epochs for MNIST, CIFAR10, Tiny Image Net, and self-driving, respectively. We explain how to tune the hyperparameter νf and provide the values of νf for each experiment in Section D.4.