reproducibilityindex.ai

How Do Vision Transformers Work?

Authors: Namuk Park, Songkuk Kim

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We obtain the main experimental results from two sets of machines for CIFAR (Krizhevsky et al., 2009). The first set consists of an Intel Xeon W-2123 Processor, 32GB memory, and a single Ge Force RTX 2080 Ti, and the other set of four Intel Intel Broadwell CPUs, 15GB memory, and a single NVIDIA T4. For Image Net (Russakovsky et al., 2015), we use AMD Ryzen Threadripper 3960X 24-Core Processor, 256GB memory, and four Ge Force RTX 2080 Ti.
Researcher Affiliation	Collaboration	Yonsei University, NAVER AI Lab {namuk.park,songkuk}@yonsei.ac.kr
Pseudocode	No	The paper describes architectural patterns using diagrams (e.g., Figure 3, Figure 11) and textual descriptions, but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/xxxnell/how-do-vits-work.
Open Datasets	Yes	We obtain the main experimental results from two sets of machines for CIFAR (Krizhevsky et al., 2009). ... For Image Net (Russakovsky et al., 2015)
Dataset Splits	No	The paper mentions training on CIFAR and ImageNet and evaluates on test datasets, but it does not specify explicit validation dataset splits (e.g., percentages or counts) for model training or hyperparameter tuning. It mentions using "10% of the training dataset" for Hessian max eigenvalue spectrum analysis, but this is not for the standard validation split.
Hardware Specification	Yes	The first set consists of an Intel Xeon W-2123 Processor, 32GB memory, and a single Ge Force RTX 2080 Ti, and the other set of four Intel Intel Broadwell CPUs, 15GB memory, and a single NVIDIA T4. For Image Net (Russakovsky et al., 2015), we use AMD Ryzen Threadripper 3960X 24-Core Processor, 256GB memory, and four Ge Force RTX 2080 Ti.
Software Dependencies	No	NN models are implemented in Py Torch (Paszke et al., 2019). While PyTorch is mentioned and cited, a specific version number (e.g., 1.9, 1.10) is not provided.
Experiment Setup	Yes	We train NNs using categorical cross-entropy (NLL) loss and Adam W optimizer (Loshchilov & Hutter, 2019) with initial learning rate of 1.25 10 4 and weight decay of 5 10 2. We also use cosine annealing scheduler (Loshchilov & Hutter, 2017). NNs are trained for 300 epochs with a batch size of 96 on CIFAR, and a batch size of 128 on Image Net. The learning rate is gradually increased (Goyal et al., 2017) for 5 epochs. Following Touvron et al. (2021), strong data augmentations such as Rand Augment (Cubuk et al., 2020), Random Erasing (Zhong et al., 2020), label smoothing (Szegedy et al., 2016), mixup (Zhang et al., 2018), and Cut Mix (Yun et al., 2019) are used for training. Stochastic depth (Huang et al., 2016) is also used to regularize NNs.