reproducibilityindex.ai

ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases

Authors: Stéphane D’Ascoli, Hugo Touvron, Matthew L Leavitt, Ari S Morcos, Giulio Biroli, Levent Sagun

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then perform experiments based on the Dei T (Touvron et al., 2020), with a certain number of SA layers replaced by GPSA layers. The resulting Convolutional Vision Transformer (Con Vi T) outperforms the Dei T while boasting a much improved sample-efﬁciency (Fig. 2).
Researcher Affiliation	Collaboration	1Department of Physics, Ecole Normale Sup erieure, Paris, France 2Facebook AI Research, Paris, France.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and models are released publicly at https://github.com/ facebookresearch/convit. [...] We provide an open-source implementation of our method as well as pretrained models at the following address: https://github.com/ facebookresearch/convit.
Open Datasets	Yes	The resulting convolutional-like Vi T architecture, Con Vi T, outperforms the Dei T (Touvron et al., 2020) on Image Net, while offering a much improved sample efﬁciency.
Dataset Splits	Yes	We compare the sample efﬁciency of our Con Vi T-S (see Tab. 1) with that of the Dei T-S by training them on restricted portions of Image Net-1k, where we only keep a certain fraction of the images of each class.
Hardware Specification	Yes	Speed is the number of images processed per second on a Nvidia Quadro GP100 GPU at batch size 128.
Software Dependencies	No	The paper mentions basing the work on DeiT and using certain hyperparameters but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	To maintain stable training while ﬁtting these models on 8 GPUs, we lowered the learning rate from 0.0005 to 0.0004 and the batch size from 1024 to 512.