reproducibilityindex.ai

XCiT: Cross-Covariance Image Transformers

Authors: Alaaeldin Ali, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Herve Jegou

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness and generality of XCi T by reporting excellent results on multiple vision benchmarks, including (self-supervised) image classiﬁcation on Image Net-1k, object detection and instance segmentation on COCO, and semantic segmentation on ADE20k.
Researcher Affiliation	Collaboration	1Facebook AI 2Inria 3Sorbonne University
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code: https://github.com/facebookresearch/xcit
Open Datasets	Yes	We use Image Net-1k [19] to train and evaluate our models for image classiﬁcation. It consists of 1.28M training images and 50k validation images, labeled across 1,000 semantic categories.
Dataset Splits	Yes	We use Image Net-1k [19] to train and evaluate our models for image classiﬁcation. It consists of 1.28M training images and 50k validation images, labeled across 1,000 semantic categories.
Hardware Specification	Yes	All measurements are performed with a batch size of 64 on a single V100-32GB GPU.
Software Dependencies	No	Our implementation is based on the Timm library [72]. Our implementation is based on the mmdetection library [13]. Our implementation is based on the mmsegmentation library [16]. No specific version numbers are provided for these libraries.
Experiment Setup	Yes	We train our model for 400 epochs with the Adam W optimizer [45] using a cosine learning rate decay. In order to enhance the training of larger models, we utilize Layer Scale [67] and adjust the stochastic depth [33] for each of our models accordingly (see the supplementary material for details). The model is trained for 36 epochs (3x schedule) using the Adam W optimizer with learning rate of 10 4, 0.05 weight decay and 16 batch size. We train for 80k and 160k iterations for Semantic FPN and Uper Net respectively. Following [44], the models are trained using batch size 16 and an Adam W optimizer with learning rate of 6 10 5 and 0.01 weight decay.