One-vs-the-Rest Loss to Focus on Important Samples in Adversarial Training
Authors: Sekitoshi Kanai, Shin’Ya Yamaguchi, Masanori Yamada, Hiroshi Takahashi, Kentaro Ohno, Yasutoshi Ida
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper experimentally reveals that the cause of their vulnerability is their small margins between logits for the true label and the other labels. Experiments demonstrate that SOVR increases logit margins more than AT and outperforms AT, GAIRAT (Zhang et al., 2021b), MAIL (Liu et al., 2021), MART (Wang et al., 2020a), MMA (Ding et al., 2020), and EWAT (Kim et al., 2021) in terms of robustness against Auto-Attack. |
| Researcher Affiliation | Collaboration | 1NTT, Tokyo, Japan 2Kyoto University, Kyoto, Japan. |
| Pseudocode | Yes | Algorithm 1 Switching one-vs-the-rest by the criterion of a logit margin loss |
| Open Source Code | No | Our experimental codes are based on source codes provided by (Wu et al., 2020; Wang et al., 2020a; Ding et al., 2020; Rade, 2021), and 1M synthetic data is provided in (Gowal et al., 2021). |
| Open Datasets | Yes | We used three datasets: CIFAR10, SVHN, and CIFAR100 (Krizhevsky & Hinton, 2009; Netzer et al., 2011). |
| Dataset Splits | Yes | We used early stopping by evaluating test robust accuracies against PGD with K = 10. Generalization gap is a gap between training and test robust accuracies against PGD (K=20) at the last epoch. |
| Hardware Specification | Yes | We used one GPU among NVIDIA V100 and NVIDIA A100 for each training in experiments. |
| Software Dependencies | No | The paper mentions "PyTorch implementation" in a footnote related to external code, but does not provide specific version numbers for PyTorch or other software dependencies used in their experiments. |
| Experiment Setup | Yes | The L norm of the perturbation was set to ε = 8/255, and all elements of xi + δi were clipped so that they were in [0,1]. We used PGD (K = 10, η = 2/255, ε = 8/255) in training. We used early stopping by evaluating test robust accuracies against PGD with K = 10. ... We set (M, λ) in SOVR to (40, 0.4) for CIFAR10 (RN18) and SOVR+AWP, (30, 0.4) for CIFAR10 (WRN) and (50, 0.6) for CIFAR100, (20, 0.2) for SVHN, (40, 0.2) for SOVR+1M DDPM. |