Boosting Vanilla Lightweight Vision Transformers via Re-parameterization
Authors: Zhentao Tan, Xiaodan Li, Yue Wu, Qi Chu, Le Lu, Nenghai Yu, Jieping Ye
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our proposed method not only boosts the performance of vanilla Vit-Tiny on various vision tasks to new state-of-the-art (SOTA) but also shows promising generality ability on other networks. |
| Researcher Affiliation | Collaboration | Alibaba Cloud1, Alibaba Group2, University of Science and Technology of China3, East China Normal University4 |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Code will be available. |
| Open Datasets | Yes | we pre-train our lightweight Vi T models on Image Net (Deng et al., 2009) which contains about 1.2M training images. We validate the performance on downstream tasks including image classification on Image Net(Deng et al., 2009), semantic image segmentation on ADE20K (Zhou et al., 2019), object detection and instance segmentation on MS COCO (Lin et al., 2014). |
| Dataset Splits | Yes | We validate the performance on downstream tasks including image classification on Image Net(Deng et al., 2009), semantic image segmentation on ADE20K (Zhou et al., 2019), object detection and instance segmentation on MS COCO (Lin et al., 2014). |
| Hardware Specification | Yes | Efficiency comparison between pre-training and inference on V100 GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch-style implementation", "Adam W optimizer", "BEi T semantic segmentation codebase", and "detectron2 codebase" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We use Adam W optimizer (Loshchilov & Hutter, 2017) (with the initial learning rate 2.4e-3, weight decay 0.05, and batch size 4096) to train the model for 300 epochs. and Table 4: Fine-tuning settings of Vi T-Tiny for Image Net classification. |