Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off
Authors: Rahul Rade, Seyed-Mohsen Moosavi-Dezfooli
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section constitutes an extensive evaluation of HAT. Initially, to test the universality of our approach, we study the performance of HAT with Res Nets on different datasets and attack configurations. Next, we leverage extra data and wider networks to obtain state-of-the-art performance on conventional robustness benchmarks. Towards the end, we conduct experiments to analyze HAT. |
| Researcher Affiliation | Academia | Rahul Rade ETH Zurich, Switzerland EMAIL Seyed-Mohsen Moosavi-Dezfooli Imperial College London, UK EMAIL |
| Pseudocode | Yes | Algorithm 1 Helper-based Adversarial Training |
| Open Source Code | Yes | Code is available at https://github.com/imrahulr/hat. |
| Open Datasets | Yes | We report results using Res Net-18 (He et al., 2016) on three datasets: CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011). ... The efficacy of HAT indeed holds for large-scale datasets such as Tiny Image Net-200 and Image Net-100 (Deng et al., 2009). |
| Dataset Splits | Yes | Finally, we perform early-stopping by tracking the performance on a disjoint validation set using PGD (K=40) with margin loss (Carlini & Wagner, 2017). We separate first 1024 samples from the training set for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and techniques (e.g., SGD optimizer, Nesterov momentum, cyclic learning rates, SiLU activation function) but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Precisely, we use SGD optimizer with Nesterov momentum (Nesterov, 1983); cyclic learning rates (Smith & Topin, 2018) with cosine annealing and a maximum learning rate of 0.21 for CIFAR10, CIFAR-100, and 0.05 for SVHN. We train each model for 50 epochs on CIFAR-10 and CIFAR100 whereas we apply 15 epochs on SVHN. For ℓ training, we use PGD attack with maximum perturbation ε = 8/255 and run the attack for K = 10 iterations for all datasets. The PGD step size is set to α = ε/4 = 2/255 for CIFAR-10, CIFAR-100; α = 1/255 for SVHN. For HAT, we fix γ to 0.5 and use β = 2.5 for CIFAR-10 and SVHN; β = 3.5 for CIFAR-100. Whereas the regularization parameter β for TRADES is set to 5.0 for CIFAR-10, SVHN and 6.0 for CIFAR-100. For MART, we choose β = 5.0. |