Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Vision Transformers Beat WideResNets on Small Scale Datasets Adversarial Robustness

Authors: Juntao Wu, Ziyu Song, Xiaoyu Zhang, Shujun Xie, Longxin Lin, Ke Wang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results provide a resounding affirmative answer. By employing Vi T, enhanced with data generated by a diffusion model for adversarial training, we demonstrate that Vi Ts can indeed outshine Wide Res Net in terms of robust accuracy. Specifically, under the Infty-norm threat model with epsilon = 8/255, our approach achieves robust accuracies of 74.97% on CIFAR-10 and 44.07% on CIFAR100, representing improvements of +3.9% and +1.4%, respectively, over the previous SOTA models. Notably, our Vi TB/2 model, with 3 times fewer parameters, surpasses the previously best-performing WRN-70-16. (See also tables 1-6 and figures 1-3 for empirical results and comparisons).
Researcher Affiliation Academia Juntao Wu, Ziyu Song, Xiaoyu Zhang, Shujun Xie, Longxin Lin, Ke Wang * State Key Laboratory of Bioactive Molecules and Druggability Assessment, Jinan University, Guangzhou, China. Guangdong Institute of Smart Education, College of Information Science and Technology, Jinan University, Guangzhou, China EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using prose and diagrams (e.g., Figure 2: 'Overview of the proposed adversarial training pipeline'), but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes Specifically, under the Infty-norm threat model with epsilon = 8/255, our approach achieves robust accuracies of 74.97% on CIFAR-10 and 44.07% on CIFAR100... The advent of diffusion models marked a turning point, as Gowal et al. (2021) integrated these models into adversarial training, substantially fortifying adversarial robustness. Further advancements were made by Wang et al. (2023), who introduced higher-quality diffusion models, thereby pushing the boundaries of adversarial robustness even further. Despite these advancements substantially bolstering robust accuracy, the state-of-the-art (SOTA) robust accuracy on small-scale datasets such as CIFAR-10 and CIFAR100 (Krizhevsky, Hinton et al. 2009) remains firmly under the dominion of Wide Res Net(WRN) (Zagoruyko and Komodakis 2016).
Dataset Splits Yes Specifically, under the Infty-norm threat model with epsilon = 8/255, our approach achieves robust accuracies of 74.97% on CIFAR-10 and 44.07% on CIFAR100... Despite these advancements substantially bolstering robust accuracy, the state-of-the-art (SOTA) robust accuracy on small-scale datasets such as CIFAR-10 and CIFAR100 (Krizhevsky, Hinton et al. 2009) remains firmly under the dominion of Wide Res Net(WRN) (Zagoruyko and Komodakis 2016). ... Therefore, our training data includes the entire training set without reserving a portion for validation.
Hardware Specification Yes All training is performed using TPU with JAX and then converted back to Py Torch for testing. ... We use TPU v4-256 for training, batch size 1024 for all configurations.
Software Dependencies No All training is performed using TPU with JAX and then converted back to Py Torch for testing. ... Our Vi T implementation aligns with the standard Vi T model from timm (Wightman 2019). For adversarial training, we employ TRADES (Zhang et al. 2019)... (The paper mentions software like JAX, PyTorch, timm, and TRADES but does not provide specific version numbers for these dependencies, which are necessary for reproducible descriptions.)
Experiment Setup Yes Wang et al. (2023) s experimental setup has been widely adopted in adversarial training for WRNs and has yielded promising results. As we aim to investigate the performance of Vi Ts, to ensure a fair comparison, we closely follow their experimental setup, incorporating label smoothing (Szegedy et al. 2016), Exponential Moving Average (EMA), and an 8:2 mix of generated and real data. However, we replace the SGD optimizer with the Lion optimizer (Chen et al. 2024), as Vi Ts do not perform optimally with SGD. ... Our Vi T implementation aligns with the standard Vi T model from timm (Wightman 2019). For adversarial training, we employ TRADES (Zhang et al. 2019) with β set to 3 for CIFAR-10 and 5 for CIFAR-100. The Lion optimizer is configured with β1 = 0.9, β2 = 0.99, and weight decay (WD) of 0.5, while a cosine annealing learning rate schedule (Loshchilov and Hutter 2016) with a peak learning rate of 1 10 4 is used. For image generation, we utilize the Elucidating Diffusion Model (EDM) (Karras et al. 2022) , and we use the dataset generated by Wang et al. Computational Time. We use TPU v4-256 for training, batch size 1024 for all configurations. ... We test the sensitivity of basic training hyperparameters on CIFAR-10. Vi T-B/2 models are trained for 2000 epochs using 50M data generated by EDM. 1024 is the default batch size. Label Smoothing. ... To demonstrate Vi T s insensitivity to LS, we will use LS = 0.4 in subsequent experiments.