Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Patches Are All You Need?
Authors: Asher Trockman, J Zico Kolter
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We primarily evaluate Conv Mixers on Image Net-1k classification without any pretraining or additional data. ... Results. A Conv Mixer-1536/20 with 52M parameters can achieve 81.4% top-1 accuracy on Image Net, and a Conv Mixer-768/32 with 21M parameters 80.2% (see Table 1). |
| Researcher Affiliation | Collaboration | Asher Trockman, J. Zico Kolter1 Carnegie Mellon University and 1Bosch Center for AI |
| Pseudocode | Yes | See Fig. 3 for an implementation of Conv Mixer in Py Torch. ... We present an even more terse implementation of Conv Mixer in Figure 8, which to the best of our knowledge is the first model that achieves the elusive dual goals of 82%+ Image Net top-1 accuracy while also fitting into a tweet. |
| Open Source Code | Yes | Our code is available at https://github.com/locuslab/convmixer. |
| Open Datasets | Yes | We primarily evaluate Conv Mixers on Image Net-1k classification without any pretraining or additional data. ... We also performed smaller-scale experiments on CIFAR-10 |
| Dataset Splits | Yes | We primarily evaluate Conv Mixers on Image Net-1k classification without any pretraining or additional data. ... We also performed smaller-scale experiments on CIFAR-10 |
| Hardware Specification | Yes | Conv Mixer-1536/20 took about 9 days to train (on 10 RTX8000s) 150 epochs ... Throughputs measured on an RTX8000 GPU ... throughputs in this section were recorded using Tesla V100 GPUs ... averaged over 16 trials on an RTX 3080Ti GPU in half precision. |
| Software Dependencies | No | We added Conv Mixer to the timm framework (Wightman, 2019) and trained it with nearly-standard settings for the common training procedure from this library: we used Rand Augment (Cubuk et al., 2020), mixup (Zhang et al., 2017), Cut Mix (Yun et al., 2019), random erasing (Zhong et al., 2020), and gradient norm clipping in addition to default timm augmentation. We used the Adam W (Loshchilov & Hutter, 2018) optimizer and a simple triangular learning rate schedule. |
| Experiment Setup | Yes | Training setup. We primarily evaluate Conv Mixers on Image Net-1k classification without any pretraining or additional data. We added Conv Mixer to the timm framework (Wightman, 2019) and trained it with nearly-standard settings for the common training procedure from this library: we used Rand Augment (Cubuk et al., 2020), mixup (Zhang et al., 2017), Cut Mix (Yun et al., 2019), random erasing (Zhong et al., 2020), and gradient norm clipping in addition to default timm augmentation. We used the Adam W (Loshchilov & Hutter, 2018) optimizer and a simple triangular learning rate schedule. ... Conv Mixer-1536/20 took about 9 days to train (on 10 RTX8000s) 150 epochs, and Conv Mixer-768/32 is over twice as fast, making 300 epochs more feasible. ... In particular, we adjusted parameters for Rand Aug, Mixup, Cut Mix, Random Erasing, and weight decay to match those in the procedure. |