Revisiting the Integration of Convolution and Attention for Vision Backbone
Authors: Lei Zhu, Xinjiang Wang, Wayne Zhang, Rynson Lau
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on various vision tasks, we empirically verify the potential of the proposed integration scheme, named GLMix |
| Researcher Affiliation | Collaboration | Lei Zhu City University of Hong Kong ray.leizhu@outlook.com Xinjiang Wang Sensetime Research wangxinjiang@sensetime.com Wayne Zhang Sensetime Research wayne.zhang@sensetime.com Rynson Lau City University of Hong Kong Rynson.Lau@cityu.edu.hk |
| Pseudocode | No | The paper does not contain a block explicitly labeled as "Pseudocode" or "Algorithm", nor does it present structured steps in a pseudocode format. |
| Open Source Code | No | Code will be available at https://github.com/rayleizhu/GLMix. |
| Open Datasets | Yes | We conduct image classification experiments on the Image Net-1k dataset [13]... We evaluate the backbones for object detection and instance segmentation on COCO 2017 [31]... Our semantic segmentation experiments are conducted on the ADE20K dataset... |
| Dataset Splits | No | The paper refers to using "standard training recipes" and existing datasets like ImageNet-1k, COCO, and ADE20K, but it does not explicitly provide the specific validation dataset splits (percentages, counts, or explicit statements about using a predefined validation split from these datasets) within its own text. |
| Hardware Specification | Yes | following the same hardware (a single Tesla V100 32G GPU) and batch size (128) configurations used in Swin-Transformer [35]. |
| Software Dependencies | No | The paper mentions using "the timm library [54]", "the MMDetection [4] toolbox", and "the MMSegmentation [10] toolbox" but does not specify the version numbers for these software components. |
| Experiment Setup | Yes | For the standard supervised training recipe, training details are in Table 8. When training with the advanced distillation recipe [26], we add an extra distillation head to the GLNet-4G/9G model and use the NFNet-F6 [2] to generate distillation targets; other training details are shown in Table 9. |