Boosting Vanilla Lightweight Vision Transformers via Re-parameterization

Authors: Zhentao Tan, Xiaodan Li, Yue Wu, Qi Chu, Le Lu, Nenghai Yu, Jieping Ye

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our proposed method not only boosts the performance of vanilla Vit-Tiny on various vision tasks to new state-of-the-art (SOTA) but also shows promising generality ability on other networks.
Researcher Affiliation Collaboration Alibaba Cloud1, Alibaba Group2, University of Science and Technology of China3, East China Normal University4
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Code will be available.
Open Datasets Yes we pre-train our lightweight Vi T models on Image Net (Deng et al., 2009) which contains about 1.2M training images. We validate the performance on downstream tasks including image classification on Image Net(Deng et al., 2009), semantic image segmentation on ADE20K (Zhou et al., 2019), object detection and instance segmentation on MS COCO (Lin et al., 2014).
Dataset Splits Yes We validate the performance on downstream tasks including image classification on Image Net(Deng et al., 2009), semantic image segmentation on ADE20K (Zhou et al., 2019), object detection and instance segmentation on MS COCO (Lin et al., 2014).
Hardware Specification Yes Efficiency comparison between pre-training and inference on V100 GPUs.
Software Dependencies No The paper mentions "Py Torch-style implementation", "Adam W optimizer", "BEi T semantic segmentation codebase", and "detectron2 codebase" but does not provide specific version numbers for these software components.
Experiment Setup Yes We use Adam W optimizer (Loshchilov & Hutter, 2017) (with the initial learning rate 2.4e-3, weight decay 0.05, and batch size 4096) to train the model for 300 epochs. and Table 4: Fine-tuning settings of Vi T-Tiny for Image Net classification.