reproducibilityindex.ai

On the Connection between Local Attention and Dynamic Depth-wise Convolution

Authors: Qi Han, Zejia Fan, Qi Dai, Lei Sun, Ming-Ming Cheng, Jiaying Liu, Jingdong Wang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The connection between local attention and dynamic depth-wise convolution is empirically veriﬁed by the ablation study about weight sharing and dynamic weight computation in Local Vision Transformer and (dynamic) depth-wise convolution. We empirically observe that the models based on depth-wise convolution and the dynamic variants with lower computation complexity perform on-par with or slightly better than Swin Transformer, an instance of Local Vision Transformer, for Image Net classiﬁcation, COCO object detection and ADE semantic segmentation.
Researcher Affiliation	Collaboration	TKLNDST, CS, Nankai Univerisy1, Peking University2, Microsoft Research Asia3, Baidu Inc.4
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Atten4Vis/ Demystify Local Vi T.
Open Datasets	Yes	Image Net-1K recognition dataset (Deng et al., 2009) contains 1.28M training images and 50K validation images with totally 1,000 classes. We use the exactly-same training setting as Swin Transformer (Liu et al., 2021b). The Adam W (Loshchilov & Hutter, 2019) optimizer for 300 epochs is adopted, with a cosine decay learning rate scheduler and 20 epochs of linear warm-up. The weight decay is 0.05, and the initial learning rate is 0.001. The augmentation and regularization strategies include Rand Augment (Cubuk et al., 2020), Mixup (Zhang et al., 2018a), Cut Mix (Yun et al., 2019), stochastic depth (Huang et al., 2016), etc.
Dataset Splits	Yes	Image Net-1K recognition dataset (Deng et al., 2009) contains 1.28M training images and 50K validation images with totally 1,000 classes.
Hardware Specification	No	The paper does not explicitly state the specific hardware used for running its experiments (e.g., specific GPU or CPU models).
Software Dependencies	No	The paper mentions software components like Adam W, Cascade Mask R-CNN, UPer Net, and MMSegmentation, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	The Adam W (Loshchilov & Hutter, 2019) optimizer for 300 epochs is adopted, with a cosine decay learning rate scheduler and 20 epochs of linear warm-up. The weight decay is 0.05, and the initial learning rate is 0.001. The stochastic depth rate is employed as 0.2 and 0.5 for the tiny and base models, respectively.