DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Authors: Ziyu Wang, Wenhao Jiang, Yiming M Zhu, Li Yuan, Yibing Song, Wei Liu
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our proposed Dyna Mixer model (97M parameters) achieves 84.3% top-1 accuracy on the Image Net-1K dataset without extra training data, performing favorably against the state-of-the-art vision MLP models. When the number of parameters is reduced to 26M, it still achieves 82.7% top-1 accuracy, surpassing the existing MLP-like models with a similar capacity. The code is available at https: //github.com/ziyuwwang/Dyna Mixer. In this section, we present the experimental results and analysis. First, we will give the configurations of our Dyna Mixer used in the experiments, and then the experimental settings and results on the Image Net-1K dataset are provided. At last, the ablation studies are presented to provide a deep understanding of the designs in our model. |
| Researcher Affiliation | Collaboration | Ziyu Wang 1 Wenhao Jiang 1 Yiming Zhu 2 Li Yuan 3 Yibing Song 4 Wei Liu 1 1Data Platform, Tencent 2Graduate school at Shen Zhen, Tsinghua university 3School of Electrical and Computer Engineering, Peking University 4Tencent AI Lab. |
| Pseudocode | Yes | Algorithm 1 Pseudo-code for Dyna Mixer Block (Py Torchlike) ###### initializaiton ####### proj_c = nn.Linear(D, D) proj_o = nn.Linear(D, D) ###### code in forward ###### def dyna_mixer_block(self, X): H, W, D = X.shape # row mixing for h = 1:H Y_h[h,:,:] = Dyna Mixer Op_h(X[h,:,:]) # column mixing for w = 1:W Y_w[:,w,:] = Dyna Mixer Op_w(X[:,w,:]) # channel mixing Y_c = proj_c(X) Y_out = Y_h + Y_w + Y_c return proj_o(Y_out) |
| Open Source Code | Yes | The code is available at https: //github.com/ziyuwwang/Dyna Mixer. |
| Open Datasets | Yes | We train our proposed Dyna Mixer on the public image classification benchmark Image Net-1K dataset (Deng et al., 2009), which covers 1K categories of natural images and contains 1.2M training images and 50K validation images. |
| Dataset Splits | Yes | We train our proposed Dyna Mixer on the public image classification benchmark Image Net-1K dataset (Deng et al., 2009), which covers 1K categories of natural images and contains 1.2M training images and 50K validation images. Because the test set for this benchmark is unlabeled, we follow the common practice by evaluating the performance on the validation set. |
| Hardware Specification | Yes | We train our model with one machine with 8 NVIDIA A100 GPUs with data parallelism. We test the throughput of our model, Vi P, and Res MLP with a batch size set of 32 on a single NVIDIA V100 GPU. |
| Software Dependencies | No | The code implementation is based on Py Torch (Paszke et al., 2019) and the TIMM1 toolbox. We use automatic mixedprecision of the Py Torch version for training acceleration. The paper mentions software names but does not provide specific version numbers for PyTorch or TIMM. |
| Experiment Setup | Yes | For model optimization, we adopt the Adam W optimizer (Loshchilov & Hutter, 2017). The learning rates for Dyna Mixer-S, Dyna Mixer-M, and Dyna Mixer-L are 0.002, and the corresponding batch sizes on one GPU are 256, 128, and 64, respectively. We set the weight decay rate to 0.05 and set the warmup learning rate to 10 6 to follow the settings in previous work (Touvron et al., 2021b; Jiang et al., 2021). The model is trained for 300 epochs. For data augmentation methods, we use Cut Out (Zhong et al., 2020), Rand Aug (Cubuk et al., 2020), Mix Up (Zhang et al., 2017), and Cut Mix (Yun et al., 2019). |