Adversarially Robust Distillation
Authors: Micah Goldblum, Liam Fowl, Soheil Feizi, Tom Goldstein3996-4003
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we find that ARD student models decisively outperform adversarially trained networks of identical architecture in terms of robust accuracy, surpassing state-of-the-art methods on standard robustness benchmarks. Finally, we adapt recent fast adversarial training methods to ARD for accelerated robust distillation. |
| Researcher Affiliation | Academia | Micah Goldblum, Liam Fowl, Soheil Feizi, Tom Goldstein University of Maryland 4176 Campus Drive, College Park, Maryland 20742 goldblum@umd.edu |
| Pseudocode | Yes | Algorithm 1: Adversarially Robust Distillation (ARD) and Algorithm 2: Fast-ARD with free adversarial training |
| Open Source Code | Yes | A Py Torch implementation of ARD can be found at: https: //github.com/goldblum/Adversarially Robust Distillation |
| Open Datasets | Yes | Table 1: Performance of an adv. trained (AT) teacher network and its student on CIFAR-10... Table 5: Robust teacher network and its students on CIFAR-100 |
| Dataset Splits | No | The paper mentions 'Robust validation accuracy' in Table 11, indicating that a validation set was used, but it does not provide specific details on the split percentages or counts for training, validation, and test datasets. |
| Hardware Specification | Yes | All models were trained on CIFAR-10 with a single RTX 2080 Ti GPU and identical batch sizes. |
| Software Dependencies | No | The paper mentions 'A Py Torch implementation' but does not specify the version number for PyTorch or any other software dependencies with their versions. |
| Experiment Setup | Yes | We train our models for 200 epochs with SGD and a momentum term of 2(10 4). Fast-ARD models are trained for 200 m epochs so that they take the same amount of time as natural distillation. We use an initial learning rate of 0.1, and we decrease the learning rate by a factor of 10 on epochs 100 and 150 (epochs 100 m for Fast-ARD). We use a temperature term of 30 for CIFAR-10 and 5 for CIFAR-100. To craft adversarial examples during training, we use FGSM-based PGD with 10 steps, ℓ attack radius of ϵ = 8 255, a step size of 2 255, and a random start. |