Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Boosting Vanilla Lightweight Vision Transformers via Re-parameterization
Authors: Zhentao Tan, Xiaodan Li, Yue Wu, Qi Chu, Le Lu, Nenghai Yu, Jieping Ye
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our proposed method not only boosts the performance of vanilla Vit-Tiny on various vision tasks to new state-of-the-art (SOTA) but also shows promising generality ability on other networks. |
| Researcher Affiliation | Collaboration | Alibaba Cloud1, Alibaba Group2, University of Science and Technology of China3, East China Normal University4 |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Code will be available. |
| Open Datasets | Yes | we pre-train our lightweight Vi T models on Image Net (Deng et al., 2009) which contains about 1.2M training images. We validate the performance on downstream tasks including image classification on Image Net(Deng et al., 2009), semantic image segmentation on ADE20K (Zhou et al., 2019), object detection and instance segmentation on MS COCO (Lin et al., 2014). |
| Dataset Splits | Yes | We validate the performance on downstream tasks including image classification on Image Net(Deng et al., 2009), semantic image segmentation on ADE20K (Zhou et al., 2019), object detection and instance segmentation on MS COCO (Lin et al., 2014). |
| Hardware Specification | Yes | Efficiency comparison between pre-training and inference on V100 GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch-style implementation", "Adam W optimizer", "BEi T semantic segmentation codebase", and "detectron2 codebase" but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We use Adam W optimizer (Loshchilov & Hutter, 2017) (with the initial learning rate 2.4e-3, weight decay 0.05, and batch size 4096) to train the model for 300 epochs. and Table 4: Fine-tuning settings of Vi T-Tiny for Image Net classification. |