reproducibilityindex.ai

Can CNNs Be More Robust Than Transformers?

Authors: Zeyu Wang, Yutong Bai, Yuyin Zhou, Cihang Xie

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments verify that all these three architectural elements consistently and effectively improve out-of-distribution robustness, from the perspective of neural architecture design.
Researcher Affiliation	Academia	Zeyu Wang1 Yutong Bai2 Yuyin Zhou1 Cihang Xie1 1UC Santa Cruz 2Johns Hopkins University
Pseudocode	No	The paper includes architectural diagrams (Figure 2, Figure 4, Figure 5) but no pseudocode or algorithm blocks.
Open Source Code	Yes	The code is publicly available at https://github.com/UCSC-VLAA/Robust CNN.
Open Datasets	Yes	Stylized-Image Net(Geirhos et al., 2018), Image Net-C(Hendrycks & Dietterich, 2018), Image Net-R (Hendrycks et al., 2021), Image Net-Sketch (Wang et al., 2019)
Dataset Splits	No	The paper refers to using the "standard 300-epoch Dei T training recipe (Touvron et al., 2021a)" and reports Image Net (IN) accuracy, which is typically on a validation set. However, it does not explicitly specify the validation dataset splits (e.g., percentages or sample counts) or explicitly refer to a 'validation set' split within the paper's text.
Hardware Specification	Yes	This work is supported by a gift from Open Philanthropy, TPU Research Cloud (TRC) program, and Google Cloud Research Credits program.
Software Dependencies	No	The paper mentions using "Adam W optimizer" and various data augmentation strategies (Rand Aug, Mix Up, Cut Mix, Random Erasing, Stochastic Depth, Repeated Augmentation), but it does not specify any software names with version numbers (e.g., PyTorch 1.x, Python 3.x).
Experiment Setup	Yes	We follow the standard 300-epoch Dei T training recipe (Touvron et al., 2021a) in this work. Specifically, we train all models using Adam W optimizer (Loshchilov & Hutter, 2019). We set the initial base learning rate to 5e-4, and apply the cosine learning rate scheduler to decrease it. Besides weight decay, we additionally adopt six data augmentation & regularization strategies (i.e., Rand Aug (Cubuk et al., 2020), Mix Up (Zhang et al., 2018), Cut Mix (Yun et al., 2019), Random Erasing (Zhong et al., 2020), Stochastic Depth (Huang et al., 2016), and Repeated Augmentation (Hoffer et al., 2020)) to regularize training.