Adversarially Robust Distillation

Authors: Micah Goldblum, Liam Fowl, Soheil Feizi, Tom Goldstein3996-4003

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation.
Researcher Affiliation Academia Micah Goldblum, Liam Fowl, Soheil Feizi, Tom Goldstein University of Maryland 4176 Campus Drive, College Park, Maryland 20742 goldblum@umd.edu
Pseudocode Yes Algorithm 1: Adversarially Robust Distillation (ARD) and Algorithm 2: Fast-ARD with free adversarial training
Open Source Code Yes A Py Torch implementation of ARD can be found at: https: //github.com/goldblum/Adversarially Robust Distillation
Open Datasets Yes Table 1: Performance of an adv. trained (AT) teacher network and its student on CIFAR-10... Table 5: Robust teacher network and its students on CIFAR-100
Dataset Splits No The paper mentions 'Robust validation accuracy' in Table 11, indicating that a validation set was used, but it does not provide specific details on the split percentages or counts for training, validation, and test datasets.
Hardware Specification Yes All models were trained on CIFAR-10 with a single RTX 2080 Ti GPU and identical batch sizes.
Software Dependencies No The paper mentions 'A Py Torch implementation' but does not specify the version number for PyTorch or any other software dependencies with their versions.
Experiment Setup Yes We train our models for 200 epochs with SGD and a momentum term of 2(10 4). Fast-ARD models are trained for 200 m epochs so that they take the same amount of time as natural distillation. We use an initial learning rate of 0.1, and we decrease the learning rate by a factor of 10 on epochs 100 and 150 (epochs 100 m for Fast-ARD). We use a temperature term of 30 for CIFAR-10 and 5 for CIFAR-100. To craft adversarial examples during training, we use FGSM-based PGD with 10 steps, ℓ attack radius of ϵ = 8 255, a step size of 2 255, and a random start.