Can CNNs Be More Robust Than Transformers?

Authors: Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments verify that all these three architectural elements consistently and effectively improve out-of-distribution robustness, from the perspective of neural architecture design.
Researcher Affiliation Academia Zeyu Wang1 Yutong Bai2 Yuyin Zhou1 Cihang Xie1 1UC Santa Cruz 2Johns Hopkins University
Pseudocode No The paper includes architectural diagrams (Figure 2, Figure 4, Figure 5) but no pseudocode or algorithm blocks.
Open Source Code Yes The code is publicly available at https://github.com/UCSC-VLAA/Robust CNN.
Open Datasets Yes Stylized-Image Net(Geirhos et al., 2018), Image Net-C(Hendrycks & Dietterich, 2018), Image Net-R (Hendrycks et al., 2021), Image Net-Sketch (Wang et al., 2019)
Dataset Splits No The paper refers to using the "standard 300-epoch Dei T training recipe (Touvron et al., 2021a)" and reports Image Net (IN) accuracy, which is typically on a validation set. However, it does not explicitly specify the validation dataset splits (e.g., percentages or sample counts) or explicitly refer to a 'validation set' split within the paper's text.
Hardware Specification Yes This work is supported by a gift from Open Philanthropy, TPU Research Cloud (TRC) program, and Google Cloud Research Credits program.
Software Dependencies No The paper mentions using "Adam W optimizer" and various data augmentation strategies (Rand Aug, Mix Up, Cut Mix, Random Erasing, Stochastic Depth, Repeated Augmentation), but it does not specify any software names with version numbers (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup Yes We follow the standard 300-epoch Dei T training recipe (Touvron et al., 2021a) in this work. Specifically, we train all models using Adam W optimizer (Loshchilov & Hutter, 2019). We set the initial base learning rate to 5e-4, and apply the cosine learning rate scheduler to decrease it. Besides weight decay, we additionally adopt six data augmentation & regularization strategies (i.e., Rand Aug (Cubuk et al., 2020), Mix Up (Zhang et al., 2018), Cut Mix (Yun et al., 2019), Random Erasing (Zhong et al., 2020), Stochastic Depth (Huang et al., 2016), and Repeated Augmentation (Hoffer et al., 2020)) to regularize training.