Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off

Authors: Rahul Rade, Seyed-Mohsen Moosavi-Dezfooli

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section constitutes an extensive evaluation of HAT. Initially, to test the universality of our approach, we study the performance of HAT with Res Nets on different datasets and attack configurations. Next, we leverage extra data and wider networks to obtain state-of-the-art performance on conventional robustness benchmarks. Towards the end, we conduct experiments to analyze HAT.
Researcher Affiliation Academia Rahul Rade ETH Zurich, Switzerland rarade@ethz.ch Seyed-Mohsen Moosavi-Dezfooli Imperial College London, UK seyed.moosavi@imperial.ac.uk
Pseudocode Yes Algorithm 1 Helper-based Adversarial Training
Open Source Code Yes Code is available at https://github.com/imrahulr/hat.
Open Datasets Yes We report results using Res Net-18 (He et al., 2016) on three datasets: CIFAR-10, CIFAR-100 (Krizhevsky, 2009) and SVHN (Netzer et al., 2011). ... The efficacy of HAT indeed holds for large-scale datasets such as Tiny Image Net-200 and Image Net-100 (Deng et al., 2009).
Dataset Splits Yes Finally, we perform early-stopping by tracking the performance on a disjoint validation set using PGD (K=40) with margin loss (Carlini & Wagner, 2017). We separate first 1024 samples from the training set for validation.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions software components and techniques (e.g., SGD optimizer, Nesterov momentum, cyclic learning rates, SiLU activation function) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes Precisely, we use SGD optimizer with Nesterov momentum (Nesterov, 1983); cyclic learning rates (Smith & Topin, 2018) with cosine annealing and a maximum learning rate of 0.21 for CIFAR10, CIFAR-100, and 0.05 for SVHN. We train each model for 50 epochs on CIFAR-10 and CIFAR100 whereas we apply 15 epochs on SVHN. For ℓ training, we use PGD attack with maximum perturbation ε = 8/255 and run the attack for K = 10 iterations for all datasets. The PGD step size is set to α = ε/4 = 2/255 for CIFAR-10, CIFAR-100; α = 1/255 for SVHN. For HAT, we fix γ to 0.5 and use β = 2.5 for CIFAR-10 and SVHN; β = 3.5 for CIFAR-100. Whereas the regularization parameter β for TRADES is set to 5.0 for CIFAR-10, SVHN and 6.0 for CIFAR-100. For MART, we choose β = 5.0.