RaMLP: Vision MLP via Region-aware Mixing

Authors: Shenqi Lai, Xi Du, Jia Guo, Kaipeng Zhang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Impressively, our Ra MLP outperforms state-of-the-art Vi Ts, CNNs, and MLPs on both Image Net-1K image classification and downstream dense prediction tasks, including MS-COCO object detection, MS-COCO instance segmentation, and ADE20K semantic segmentation. In particular, Ra MLP outperforms MLPs by a large margin (around 1.5% Apb or 1.0% m Io U) on dense prediction tasks. The training code could be found at https://github.com/xiaolai-sqlai/Ra MLP.
Researcher Affiliation Collaboration Shenqi Lai1, Xi Du2, Jia Guo1 and Kaipeng Zhang3 1Insight Face.ai 2Kiwi Tech 3Shanghai AI Laboratory laishenqi@qq.com, leo.du@kiwiar.com, guojia@gmail.com, kp zhang@foxmail.com
Pseudocode No The paper describes the model architecture and its components but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The training code could be found at https://github.com/xiaolai-sqlai/Ra MLP.
Open Datasets Yes We train our models on the Image Net-1K [Deng et al., 2009] dataset from scratch, which contains 1.2M training images and 50K validation images evenly spreading 1,000 categories. We report the top-1 accuracy on the validation set following the standard practice in this community. For fair comparisons, our training strategy is mostly adopted from Cycle MLP, including Rand Augment, Mixup, Cutmix, random erasing, and stochastic depth. Adam W and cosine learning rate schedules with the initial value of 1 10 3 are adopted. All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512.
Dataset Splits Yes We train our models on the Image Net-1K [Deng et al., 2009] dataset from scratch, which contains 1.2M training images and 50K validation images evenly spreading 1,000 categories.
Hardware Specification Yes All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512.
Software Dependencies No The paper mentions optimizers (Adam W) and training strategies (Rand Augment, Mixup, Cutmix, random erasing, stochastic depth) and uses frameworks like Retina Net and Mask R-CNN, but does not provide specific version numbers for any software dependencies (e.g., PyTorch version, Python version, specific library versions).
Experiment Setup Yes For fair comparisons, our training strategy is mostly adopted from Cycle MLP, including Rand Augment, Mixup, Cutmix, random erasing, and stochastic depth. Adam W and cosine learning rate schedules with the initial value of 1 10 3 are adopted. All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512.