MG-ViT: A Multi-Granularity Method for Compact and Efficient Vision Transformers

Authors: Yu Zhang, Yepeng Liu, Duoqian Miao, Qi Zhang, Yiwei Shi, Liang Hu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments Prove the effectiveness of the multigranularity strategy. For instance, on Image Net, without any loss of performance, MG-Vi T reduces 47% FLOPs of LV-Vi T-S and 56% FLOPs of Dei T-S.
Researcher Affiliation Academia Yu Zhang1 Yepeng Liu2 Duoqian Miao1 Qi Zhang1 Yiwei Shi3 Liang Hu1 1Tongji University 2University of Florida 3University of Bristol
Pseudocode No The paper contains architectural diagrams and mathematical formulas but does not include any explicit pseudocode blocks or sections labeled 'Algorithm'.
Open Source Code No The paper does not include any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We select LV-Vi T [17] and Dei T [15] to assess the performance of MG-Vi T on Image Net [18]. ...object detection and semantic segmentation on the MS-COCO [19] and ADE20K [20] datasets, respectively.
Dataset Splits No The paper mentions using standard datasets like ImageNet, MS-COCO, and ADE20K, which have predefined splits. However, it does not explicitly state the training/validation/test dataset splits by percentage, absolute sample counts, or specific split files.
Hardware Specification Yes All metrics are measured on a single NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions using the 'Adam W optimizer' but does not specify version numbers for other key software components or libraries such as Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes The resolution of input images in our experiments is 224 224. In SGIS, we split each image into 7 7 patches. rh and rt are set to 0.1 and 0.4, respectively. For Dei T-S, we inserted a total of three PPSM modules in the 3rd, 7th, and 10th layers. For conducting the training process, we set the batch size to 256 and use Adam W optimizer to train all models for 300 epochs.