XCiT: Cross-Covariance Image Transformers
Authors: Alaaeldin Ali, Hugo Touvron, Mathilde Caron, Piotr Bojanowski, Matthijs Douze, Armand Joulin, Ivan Laptev, Natalia Neverova, Gabriel Synnaeve, Jakob Verbeek, Herve Jegou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the effectiveness and generality of XCi T by reporting excellent results on multiple vision benchmarks, including (self-supervised) image classification on Image Net-1k, object detection and instance segmentation on COCO, and semantic segmentation on ADE20k. |
| Researcher Affiliation | Collaboration | 1Facebook AI 2Inria 3Sorbonne University |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | Code: https://github.com/facebookresearch/xcit |
| Open Datasets | Yes | We use Image Net-1k [19] to train and evaluate our models for image classification. It consists of 1.28M training images and 50k validation images, labeled across 1,000 semantic categories. |
| Dataset Splits | Yes | We use Image Net-1k [19] to train and evaluate our models for image classification. It consists of 1.28M training images and 50k validation images, labeled across 1,000 semantic categories. |
| Hardware Specification | Yes | All measurements are performed with a batch size of 64 on a single V100-32GB GPU. |
| Software Dependencies | No | Our implementation is based on the Timm library [72]. Our implementation is based on the mmdetection library [13]. Our implementation is based on the mmsegmentation library [16]. No specific version numbers are provided for these libraries. |
| Experiment Setup | Yes | We train our model for 400 epochs with the Adam W optimizer [45] using a cosine learning rate decay. In order to enhance the training of larger models, we utilize Layer Scale [67] and adjust the stochastic depth [33] for each of our models accordingly (see the supplementary material for details). The model is trained for 36 epochs (3x schedule) using the Adam W optimizer with learning rate of 10 4, 0.05 weight decay and 16 batch size. We train for 80k and 160k iterations for Semantic FPN and Uper Net respectively. Following [44], the models are trained using batch size 16 and an Adam W optimizer with learning rate of 6 10 5 and 0.01 weight decay. |