DynaMixer: A Vision MLP Architecture with Dynamic Mixing

Authors: Ziyu Wang, Wenhao Jiang, Yiming M Zhu, Li Yuan, Yibing Song, Wei Liu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our proposed Dyna Mixer model (97M parameters) achieves 84.3% top-1 accuracy on the Image Net-1K dataset without extra training data, performing favorably against the state-of-the-art vision MLP models. When the number of parameters is reduced to 26M, it still achieves 82.7% top-1 accuracy, surpassing the existing MLP-like models with a similar capacity. The code is available at https: //github.com/ziyuwwang/Dyna Mixer. In this section, we present the experimental results and analysis. First, we will give the configurations of our Dyna Mixer used in the experiments, and then the experimental settings and results on the Image Net-1K dataset are provided. At last, the ablation studies are presented to provide a deep understanding of the designs in our model.
Researcher Affiliation Collaboration Ziyu Wang 1 Wenhao Jiang 1 Yiming Zhu 2 Li Yuan 3 Yibing Song 4 Wei Liu 1 1Data Platform, Tencent 2Graduate school at Shen Zhen, Tsinghua university 3School of Electrical and Computer Engineering, Peking University 4Tencent AI Lab.
Pseudocode Yes Algorithm 1 Pseudo-code for Dyna Mixer Block (Py Torchlike) ###### initializaiton ####### proj_c = nn.Linear(D, D) proj_o = nn.Linear(D, D) ###### code in forward ###### def dyna_mixer_block(self, X): H, W, D = X.shape # row mixing for h = 1:H Y_h[h,:,:] = Dyna Mixer Op_h(X[h,:,:]) # column mixing for w = 1:W Y_w[:,w,:] = Dyna Mixer Op_w(X[:,w,:]) # channel mixing Y_c = proj_c(X) Y_out = Y_h + Y_w + Y_c return proj_o(Y_out)
Open Source Code Yes The code is available at https: //github.com/ziyuwwang/Dyna Mixer.
Open Datasets Yes We train our proposed Dyna Mixer on the public image classification benchmark Image Net-1K dataset (Deng et al., 2009), which covers 1K categories of natural images and contains 1.2M training images and 50K validation images.
Dataset Splits Yes We train our proposed Dyna Mixer on the public image classification benchmark Image Net-1K dataset (Deng et al., 2009), which covers 1K categories of natural images and contains 1.2M training images and 50K validation images. Because the test set for this benchmark is unlabeled, we follow the common practice by evaluating the performance on the validation set.
Hardware Specification Yes We train our model with one machine with 8 NVIDIA A100 GPUs with data parallelism. We test the throughput of our model, Vi P, and Res MLP with a batch size set of 32 on a single NVIDIA V100 GPU.
Software Dependencies No The code implementation is based on Py Torch (Paszke et al., 2019) and the TIMM1 toolbox. We use automatic mixedprecision of the Py Torch version for training acceleration. The paper mentions software names but does not provide specific version numbers for PyTorch or TIMM.
Experiment Setup Yes For model optimization, we adopt the Adam W optimizer (Loshchilov & Hutter, 2017). The learning rates for Dyna Mixer-S, Dyna Mixer-M, and Dyna Mixer-L are 0.002, and the corresponding batch sizes on one GPU are 256, 128, and 64, respectively. We set the weight decay rate to 0.05 and set the warmup learning rate to 10 6 to follow the settings in previous work (Touvron et al., 2021b; Jiang et al., 2021). The model is trained for 300 epochs. For data augmentation methods, we use Cut Out (Zhong et al., 2020), Rand Aug (Cubuk et al., 2020), Mix Up (Zhang et al., 2017), and Cut Mix (Yun et al., 2019).