Inception Transformer

Authors: Chenyang Si, Weihao Yu, Pan Zhou, Yichen Zhou, Xinchao Wang, Shuicheng Yan

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark the i Former on a series of vision tasks, and showcase that it achieves impressive performance on image classification, COCO detection and ADE20K segmentation. For example, our i Former-S hits the top-1 accuracy of 83.4% on Image Net-1K, much higher than Dei T-S by 3.6%, and even slightly better than much bigger model Swin-B (83.3%) with only 1/4 parameters and 1/3 FLOPs. Experimental results show that i Former surpasses state-of-the-art Vi Ts and CNNs on several vision tasks, including image classification, object detection and segmentation.
Researcher Affiliation Collaboration 1Sea AI Lab 2National University of Singapore {sicy,yuweihao,zhoupan,zhouyc,yansc}@sea.com, xinchao@nus.edu.sg
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Code and models are released at https://github.com/sail-sg/i Former.
Open Datasets Yes We evaluate i Former on the Image Net dataset [28]. We evaluate i Former on the COCO object detection and instance segmentation tasks [31]. We further evaluate the generality of i Former through a challenging scene parsing benchmark on semantic segmentation, i.e., ADE20K [32].
Dataset Splits Yes For image classification, we evaluate i Former on the Image Net dataset [28]. We train the i Former model with the standard procedure in [6, 22, 29]. Specifically, we use Adam W optimizer with an initial learning rate 1 × 10−3 via cosine decay [70], a momentum of 0.9, and a weight decay of 0.05. We set the training epoch number as 300 and the input size as 224 × 224. We adopt the same data augmentations and regularization methods in Dei T [29] for fair comparison. ... We also use Layer Scale [71] to train deep models. Like previous studies [5, 67], we further fine tune i Former on the input size of 384 × 384, with the weight decay of 1 × 10−8, learning rate of 1 × 10−5, batch size of 512. For object detection and instance segmentation: 'trained on 118K images and evaluated on validation set with 5K images'. For semantic segmentation: 'The dataset contains 20K training images and 2K validation images'.
Hardware Specification No No specific hardware details (like GPU/CPU models or specific cloud instances) used for running experiments were provided in the paper. It only vaguely mentions 'partial computational resources' and 'GCP research credits'.
Software Dependencies No The paper mentions 'Timm [72] to implement and train i Former.', 'mmdetection [78] codebase', and 'mmsegmentation [80] codebase', but does not specify version numbers for these software dependencies.
Experiment Setup Yes For image classification, we use Adam W optimizer with an initial learning rate 1 × 10−3 via cosine decay [70], a momentum of 0.9, and a weight decay of 0.05. We set the training epoch number as 300 and the input size as 224 × 224. ... fine tune i Former on the input size of 384 × 384, with the weight decay of 1 × 10−8, learning rate of 1 × 10−5, batch size of 512. For COCO, we use Adam W with an initial learning rate of 1 × 10−4, a batch size of 16, and 1× training schedule with 12 epochs. For ADE20K, we use Adam W with an initial learning rate of 2 × 10−4 with cosine learning rate schedule to train 80k iterations.