Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Patches Are All You Need?

Authors: Asher Trockman, J Zico Kolter

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We primarily evaluate Conv Mixers on Image Net-1k classiﬁcation without any pretraining or additional data. ... Results. A Conv Mixer-1536/20 with 52M parameters can achieve 81.4% top-1 accuracy on Image Net, and a Conv Mixer-768/32 with 21M parameters 80.2% (see Table 1).
Researcher Affiliation	Collaboration	Asher Trockman, J. Zico Kolter1 Carnegie Mellon University and 1Bosch Center for AI
Pseudocode	Yes	See Fig. 3 for an implementation of Conv Mixer in Py Torch. ... We present an even more terse implementation of Conv Mixer in Figure 8, which to the best of our knowledge is the ﬁrst model that achieves the elusive dual goals of 82%+ Image Net top-1 accuracy while also ﬁtting into a tweet.
Open Source Code	Yes	Our code is available at https://github.com/locuslab/convmixer.
Open Datasets	Yes	We primarily evaluate Conv Mixers on Image Net-1k classiﬁcation without any pretraining or additional data. ... We also performed smaller-scale experiments on CIFAR-10
Dataset Splits	Yes	We primarily evaluate Conv Mixers on Image Net-1k classiﬁcation without any pretraining or additional data. ... We also performed smaller-scale experiments on CIFAR-10
Hardware Specification	Yes	Conv Mixer-1536/20 took about 9 days to train (on 10 RTX8000s) 150 epochs ... Throughputs measured on an RTX8000 GPU ... throughputs in this section were recorded using Tesla V100 GPUs ... averaged over 16 trials on an RTX 3080Ti GPU in half precision.
Software Dependencies	No	We added Conv Mixer to the timm framework (Wightman, 2019) and trained it with nearly-standard settings for the common training procedure from this library: we used Rand Augment (Cubuk et al., 2020), mixup (Zhang et al., 2017), Cut Mix (Yun et al., 2019), random erasing (Zhong et al., 2020), and gradient norm clipping in addition to default timm augmentation. We used the Adam W (Loshchilov & Hutter, 2018) optimizer and a simple triangular learning rate schedule.
Experiment Setup	Yes	Training setup. We primarily evaluate Conv Mixers on Image Net-1k classiﬁcation without any pretraining or additional data. We added Conv Mixer to the timm framework (Wightman, 2019) and trained it with nearly-standard settings for the common training procedure from this library: we used Rand Augment (Cubuk et al., 2020), mixup (Zhang et al., 2017), Cut Mix (Yun et al., 2019), random erasing (Zhong et al., 2020), and gradient norm clipping in addition to default timm augmentation. We used the Adam W (Loshchilov & Hutter, 2018) optimizer and a simple triangular learning rate schedule. ... Conv Mixer-1536/20 took about 9 days to train (on 10 RTX8000s) 150 epochs, and Conv Mixer-768/32 is over twice as fast, making 300 epochs more feasible. ... In particular, we adjusted parameters for Rand Aug, Mixup, Cut Mix, Random Erasing, and weight decay to match those in the procedure.