Conserve-Update-Revise to Cure Generalization and Robustness Trade-off in Adversarial Training
Authors: Shruthi Gowda, Bahram Zonooz, Elahe Arani
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical findings demonstrate that selectively updating specific layers while preserving others can substantially enhance the network s learning capacity. We therefore propose CURE, a novel training framework that leverages a gradient prominence criterion to perform selective conservation, updating, and revision of weights. |
| Researcher Affiliation | Collaboration | 1Nav Info Europe 2Eindhoven University of Technology 3Tom Tom 4Wayve |
| Pseudocode | Yes | Algorithm is detailed in Appendix, Section A. Algorithm 1 CURE: Conserve-Update-Revise |
| Open Source Code | Yes | 1The code is available at: https://github.com/Neur AI-Lab/CURE. |
| Open Datasets | Yes | Datasets used in our study include CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and SVHN. |
| Dataset Splits | No | The paper mentions training on datasets like CIFAR-10, CIFAR-100, and SVHN, and refers to "Adversarial Acc (validation)" in Figure 5(c), but it does not provide specific details on how the training, validation, or test splits were performed (e.g., percentages, sample counts, or explicit references to standard split methodologies). |
| Hardware Specification | No | The paper states that "all models are trained using the SGD optimizer" and discusses various experimental parameters, but it does not provide any specific details about the hardware used (e.g., CPU, GPU models, memory, or cloud instances). |
| Software Dependencies | No | The paper mentions the use of an "SGD optimizer" and "Projected Gradient Descent (PGD)" for adversarial image generation, but it does not specify the version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used. |
| Experiment Setup | Yes | For our method, all models are trained using the SGD optimizer with a momentum of 0.9. The augmentations include basic random crop and random flip operations. Projected Gradient Descent (PGD) is used to generate adversarial images. For adversarial training, PGD with step 10 is considered with perturbation strength ϵ = 8 and step size ϵ/4. Table 9 tabulates the other hyperparameters used in our method. The learning rate is 0.1, the number of epochs is 200 and the weight decay is 5e 3. The revision rate r and decay factor d for the revision stage are set to 0.2 and 0.999 for all the experiments. |