RaMLP: Vision MLP via Region-aware Mixing
Authors: Shenqi Lai, Xi Du, Jia Guo, Kaipeng Zhang
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Impressively, our Ra MLP outperforms state-of-the-art Vi Ts, CNNs, and MLPs on both Image Net-1K image classification and downstream dense prediction tasks, including MS-COCO object detection, MS-COCO instance segmentation, and ADE20K semantic segmentation. In particular, Ra MLP outperforms MLPs by a large margin (around 1.5% Apb or 1.0% m Io U) on dense prediction tasks. The training code could be found at https://github.com/xiaolai-sqlai/Ra MLP. |
| Researcher Affiliation | Collaboration | Shenqi Lai1, Xi Du2, Jia Guo1 and Kaipeng Zhang3 1Insight Face.ai 2Kiwi Tech 3Shanghai AI Laboratory laishenqi@qq.com, leo.du@kiwiar.com, guojia@gmail.com, kp zhang@foxmail.com |
| Pseudocode | No | The paper describes the model architecture and its components but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The training code could be found at https://github.com/xiaolai-sqlai/Ra MLP. |
| Open Datasets | Yes | We train our models on the Image Net-1K [Deng et al., 2009] dataset from scratch, which contains 1.2M training images and 50K validation images evenly spreading 1,000 categories. We report the top-1 accuracy on the validation set following the standard practice in this community. For fair comparisons, our training strategy is mostly adopted from Cycle MLP, including Rand Augment, Mixup, Cutmix, random erasing, and stochastic depth. Adam W and cosine learning rate schedules with the initial value of 1 10 3 are adopted. All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512. |
| Dataset Splits | Yes | We train our models on the Image Net-1K [Deng et al., 2009] dataset from scratch, which contains 1.2M training images and 50K validation images evenly spreading 1,000 categories. |
| Hardware Specification | Yes | All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512. |
| Software Dependencies | No | The paper mentions optimizers (Adam W) and training strategies (Rand Augment, Mixup, Cutmix, random erasing, stochastic depth) and uses frameworks like Retina Net and Mask R-CNN, but does not provide specific version numbers for any software dependencies (e.g., PyTorch version, Python version, specific library versions). |
| Experiment Setup | Yes | For fair comparisons, our training strategy is mostly adopted from Cycle MLP, including Rand Augment, Mixup, Cutmix, random erasing, and stochastic depth. Adam W and cosine learning rate schedules with the initial value of 1 10 3 are adopted. All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512. |