AS-MLP: An Axial Shifted MLP Architecture for Vision
Authors: Dongze Lian, Zehao Yu, Xing Sun, Shenghua Gao
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effectiveness of our AS-MLP, we conduct experiments of the image classification on the Image Net-1K benchmark... All image classification results are shown in Table 1. We divide all network architectures into CNN-based, Transformer-based and MLP-based architectures... The experimental results show that our model significantly exceeds Swin Transformer (Liu et al., 2021b) in the mobile setting (76.05% vs. 75.11%). We also compare the different connection types of AS-MLP block, such as serial connection and parallel connection, and the results are shown in Table 3b. |
| Researcher Affiliation | Collaboration | Dongze Lian , Zehao Yu Shanghai Tech University {liandz,yuzh}@shanghaitech.edu.cn Xing Sun Youtu Lab, Tencent {winfredsun}@tencent.com Shenghua Gao Shanghai Tech University & Shanghai Engineering Research Center of Intelligent Vision and Imaging & Shanghai Engineering Research Center of Energy Efficient and Custom AI IC {gaoshh}@shanghaitech.edu.cn |
| Pseudocode | Yes | Algorithm 1 Code of AS-MLP Block in a Py Torch-like style. |
| Open Source Code | Yes | Code is available at https://github.com/svip-lab/AS-MLP. |
| Open Datasets | Yes | To evaluate the effectiveness of our AS-MLP, we conduct experiments of the image classification on the Image Net-1K benchmark, which is collected in (Deng et al., 2009). It contains 1.28M training images and 20K validation images from a total of 1000 classes. For the object detection and instance segmentation, we employ mmdetection (Chen et al., 2019) as the framework and COCO (Lin et al., 2014) as the evaluation dataset, which consists of 118K training data and 5K validation data. Following Swin Transformer (Liu et al., 2021b), we conduct experiments of AS-MLP on the challenging semantic segmentation dataset, ADE20K, which contains 20,210 training images and 2,000 validation images. |
| Dataset Splits | Yes | It contains 1.28M training images and 20K validation images from a total of 1000 classes. COCO (Lin et al., 2014) as the evaluation dataset, which consists of 118K training data and 5K validation data. ADE20K, which contains 20,210 training images and 2,000 validation images. |
| Hardware Specification | Yes | Throughput is measured with the batch size of 64 on a single V100 GPU (32GB). |
| Software Dependencies | No | The paper mentions `import torch` and `import torch.nn.functional as F` in Algorithm 1, implying the use of PyTorch. However, no specific version numbers are provided for PyTorch or any other software components. |
| Experiment Setup | Yes | We use an initial learning rate of 0.001 with cosine decay and 20 epochs of linear warm-up. The Adam W (Loshchilov & Hutter, 2019) optimizer is employed to train the whole model for 300 epochs with a batch size of 1024. Following the training strategy of Swin Transformer (Liu et al., 2021b), we also use label smoothing (Szegedy et al., 2016) with a smooth ratio of 0.1 and Drop Path (Huang et al., 2016) strategy. For object detection: optimizer (Adam W), learning rate (0.0001), weight decay (0.05), and batch size (2 imgs/per GPU 8 GPUs). For semantic segmentation: optimizer (Adam W), learning rate (6 10 5), weight decay (0.01), and batch size (2 imgs/per GPU 8 GPUs). The input image resolution is 512 512, the stochastic depth ratio is set as 0.3 and all models are initialized with weights pre-trained on Image Net-1K and are trained 160K iterations. |