CoAtFormer: Vision Transformer with Composite Attention
Authors: Zhiyong Chang, Mingjun Yin, Yan Wang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show our Co At Former achieves state-of-the-art results on various different tasks. |
| Researcher Affiliation | Collaboration | Zhiyong Chang1 , Mingjun Yin2 , Yan Wang3 1Peking University 2The University of Melbourne 3Zuoyebang |
| Pseudocode | No | The paper includes mathematical formulations (Equations 1-19) and architectural diagrams (Figures 2 and 3), but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statement about releasing source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | We conduct experiments on Image Net-1K [Deng et al., 2009] classification, COCO [Lin et al., 2014] object detection and instance segmentation, and ADE20K [Zhou et al., 2017] semantic segmentation. |
| Dataset Splits | Yes | We conduct experiments on Image Net-1K [Deng et al., 2009] classification, COCO [Lin et al., 2014] object detection and instance segmentation, and ADE20K [Zhou et al., 2017] semantic segmentation. For fair comparison, we follow the same training strategies as previous works [Touvron et al., 2020; Liu et al., 2021]. |
| Hardware Specification | No | The paper mentions computational costs (e.g., FLOPs) and parameters but does not specify any hardware details like GPU/CPU models, memory, or specific cloud computing instances used for experiments. |
| Software Dependencies | No | The paper mentions software components such as 'Adam W optimizer', 'Mask R-CNN', 'Cascade Mask R-CNN', 'Uper Net', and 'GELU activation', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | For fair comparison, we follow the same training strategies as previous works [Touvron et al., 2020; Liu et al., 2021]. Specifically, we train all our models for 300 epochs with the input size of 224 224. We employ the Adam W optimizer with weight decay of 0.05. The default batch size and initial learning rate are set to 1024 and 0.001. |