Inception Transformer
Authors: Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We benchmark the i Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation. For example, our i Former-S hits the top-1 accuracy of 83.4% on Image Net-1K, much higher than Dei T-S by 3.6%, and even slightly better than much bigger model Swin-B (83.3%) with only 1/4 parameters and 1/3 FLOPs. Experimental results show that i Former surpasses state-of-the-art Vi Ts and CNNs on several vision tasks, including image classification, object detection and segmentation. |
| Researcher Affiliation | Collaboration | 1Sea AI Lab 2National University of Singapore {sicy,yuweihao,zhoupan,zhouyc,yansc}@sea.com, xinchao@nus.edu.sg |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code and models are released at https://github.com/sail-sg/i Former. |
| Open Datasets | Yes | We evaluate i Former on the Image Net dataset [28]. We evaluate i Former on the COCO object detection and instance segmentation tasks [31]. We further evaluate the generality of i Former through a challenging scene parsing benchmark on semantic segmentation, i.e., ADE20K [32]. |
| Dataset Splits | Yes | For image classification, we evaluate i Former on the Image Net dataset [28]. We train the i Former model with the standard procedure in [6, 22, 29]. Specifically, we use Adam W optimizer with an initial learning rate 1 × 10−3 via cosine decay [70], a momentum of 0.9, and a weight decay of 0.05. We set the training epoch number as 300 and the input size as 224 × 224. We adopt the same data augmentations and regularization methods in Dei T [29] for fair comparison. ... We also use Layer Scale [71] to train deep models. Like previous studies [5, 67], we further fine tune i Former on the input size of 384 × 384, with the weight decay of 1 × 10−8, learning rate of 1 × 10−5, batch size of 512. For object detection and instance segmentation: 'trained on 118K images and evaluated on validation set with 5K images'. For semantic segmentation: 'The dataset contains 20K training images and 2K validation images'. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or specific cloud instances) used for running experiments were provided in the paper. It only vaguely mentions 'partial computational resources' and 'GCP research credits'. |
| Software Dependencies | No | The paper mentions 'Timm [72] to implement and train i Former.', 'mmdetection [78] codebase', and 'mmsegmentation [80] codebase', but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | For image classification, we use Adam W optimizer with an initial learning rate 1 × 10−3 via cosine decay [70], a momentum of 0.9, and a weight decay of 0.05. We set the training epoch number as 300 and the input size as 224 × 224. ... fine tune i Former on the input size of 384 × 384, with the weight decay of 1 × 10−8, learning rate of 1 × 10−5, batch size of 512. For COCO, we use Adam W with an initial learning rate of 1 × 10−4, a batch size of 16, and 1× training schedule with 12 epochs. For ADE20K, we use Adam W with an initial learning rate of 2 × 10−4 with cosine learning rate schedule to train 80k iterations. |