reproducibilityindex.ai

CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention

Authors: Wenxiao Wang, Lu Yao, Long Chen, Binbin Lin, Deng Cai, Xiaofei He, Wei Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Cross Former outperforms the other vision transformers on image classiﬁcation, object detection, instance segmentation, and semantic segmentation tasks.
Researcher Affiliation	Collaboration	1State Key Lab of CAD & CG, Zhejiang University 2Data Platform, Tencent 3Columbia University 4School of Software Technology, Zhejiang University
Pseudocode	Yes	Algorithm 1 LSDA code (Py Torch-like)
Open Source Code	Yes	1The code has been released: https://github.com/cheerss/Cross Former
Open Datasets	Yes	The experiments on image classiﬁcation are done with the Image Net (Russakovsky et al., 2015) dataset. The experiments on object detection and instance segmentation are both done on the COCO 2017 dataset (Lin et al., 2014). ADE20K (Zhou et al., 2017) is used as the benchmark for semantic segmentation.
Dataset Splits	Yes	The models are trained on 1.28M training images and tested on 50K validation images. (ImageNet) and COCO 2017 dataset (Lin et al., 2014), which contains 118K training and 5K val images. (COCO)
Hardware Specification	Yes	The batch size is 1,024 split on 8 V100 GPUs.
Software Dependencies	No	The paper mentions software like Adam W, MMDetection, and MMSegmentation, but does not provide specific version numbers for these or other key software dependencies required for reproduction.
Experiment Setup	Yes	In particular, we use an Adam W (Kingma & Ba, 2015) optimizer training for 300 epochs with a cosine decay learning rate scheduler, and 20 epochs of linear warm-up are used. The batch size is 1,024 split on 8 V100 GPUs. An initial learning rate of 0.001 and a weight decay of 0.05 are used. Besides, we use drop path rate of 0.1, 0.2, 0.3, 0.5 for Cross Former-T, Cross Former-S, Cross Former-B, Cross Former-L, respectively. Further, Similar to Swin (Liu et al., 2021b), Rand Augment (Cubuk et al., 2020), Mixup (Zhang et al., 2018a), Cutmix (Yun et al., 2019), random erasing (Zhong et al., 2020), and stochastic depth (Huang et al., 2016) are used for data augmentation.