reproducibilityindex.ai

RaMLP: Vision MLP via Region-aware Mixing

Authors: Shenqi Lai, Xi Du, Jia Guo, Kaipeng Zhang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Impressively, our Ra MLP outperforms state-of-the-art Vi Ts, CNNs, and MLPs on both Image Net-1K image classification and downstream dense prediction tasks, including MS-COCO object detection, MS-COCO instance segmentation, and ADE20K semantic segmentation. In particular, Ra MLP outperforms MLPs by a large margin (around 1.5% Apb or 1.0% m Io U) on dense prediction tasks. The training code could be found at https://github.com/xiaolai-sqlai/Ra MLP.
Researcher Affiliation	Collaboration	Shenqi Lai1, Xi Du2, Jia Guo1 and Kaipeng Zhang3 1Insight Face.ai 2Kiwi Tech 3Shanghai AI Laboratory laishenqi@qq.com, leo.du@kiwiar.com, guojia@gmail.com, kp zhang@foxmail.com
Pseudocode	No	The paper describes the model architecture and its components but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The training code could be found at https://github.com/xiaolai-sqlai/Ra MLP.
Open Datasets	Yes	We train our models on the Image Net-1K [Deng et al., 2009] dataset from scratch, which contains 1.2M training images and 50K validation images evenly spreading 1,000 categories. We report the top-1 accuracy on the validation set following the standard practice in this community. For fair comparisons, our training strategy is mostly adopted from Cycle MLP, including Rand Augment, Mixup, Cutmix, random erasing, and stochastic depth. Adam W and cosine learning rate schedules with the initial value of 1 10 3 are adopted. All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512.
Dataset Splits	Yes	We train our models on the Image Net-1K [Deng et al., 2009] dataset from scratch, which contains 1.2M training images and 50K validation images evenly spreading 1,000 categories.
Hardware Specification	Yes	All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512.
Software Dependencies	No	The paper mentions optimizers (Adam W) and training strategies (Rand Augment, Mixup, Cutmix, random erasing, stochastic depth) and uses frameworks like Retina Net and Mask R-CNN, but does not provide specific version numbers for any software dependencies (e.g., PyTorch version, Python version, specific library versions).
Experiment Setup	Yes	For fair comparisons, our training strategy is mostly adopted from Cycle MLP, including Rand Augment, Mixup, Cutmix, random erasing, and stochastic depth. Adam W and cosine learning rate schedules with the initial value of 1 10 3 are adopted. All models are trained for 300 epochs with a 20-epoch warm-up on Nvidia 3090 GPUs with a batch size of 512.