On the Scalability of Certified Adversarial Robustness with Generated Data
Authors: Thomas Altstidl, David Dobre, Arthur Kosmala, Bjoern Eskofier, Gauthier Gidel, Leo Schwinn
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our empirical study, we analyze models trained to be robust against ℓ and models trained to be robust against ℓ2 norm attacks. The proposed approach improves robustness for both the (ℓ , ϵ = 8/255) and (ℓ2, ϵ = 36/255) threat models on CIFAR-10, improving upon the previous results in the literature by 3.95%p and 1.39%p (percentage points). Our approach achieves state-of-the-art deterministic robustness certificates on CIFAR-10 for the ℓ2 (ϵ = 36/255) and ℓ (ϵ = 8/255) threat models, outperforming the previous results by +3.95 and +1.39 percentage points, respectively. |
| Researcher Affiliation | Academia | 1 Machine Learning and Data Analytics Lab, FAU Erlangen Nürnberg, Germany 2 Institute of AI for Health, Helmholtz Zentrum München, Germany 3 Mila, Université de Montréal, Canada 4 Canada CIFAR AI Chair 5 Data Analytics and Machine Learning, Technische Universität München, Germany {thomas.r.altstidl,bjoern.eskofier}@fau.de {david-a.dobre,gidelgau}@mila.quebec {a.kosmala,l.schwinn}@tum.de |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | All code used to produce the results and figures in this paper will be released on Git Hub after publication. |
| Open Datasets | Yes | We perform experiments on CIFAR-10 and CIFAR-100 [22], for which EDM-generated data is readily available and a wealth of previous robustness research exists [6]. |
| Dataset Splits | No | The paper mentions using CIFAR-10 and CIFAR-100 datasets for training and testing, and describes augmenting the training data with generated images, but it does not explicitly state details about training, validation, and test splits (e.g., percentages or specific subsets used for validation). |
| Hardware Specification | Yes | All our experiments are done on a single Nvidia A100 graphics card (40GB of VRAM) without distributed training. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | With additional data, it is also expected that both model size and the number of training epochs can be further scaled to improve clean accuracy and robustness. We thus perform experiments on the influence of model depth and the number of epochs on clean and certified accuracy. For some models, we investigate further techniques that add learning capacity. Concretely, for Sort Net [18] we also experiment with models that do not employ dropout, and for LOT [17] we adjust the learning rate scheduler to cosine annealing [28]. |