MIA-Former: Efficient and Robust Vision Transformers via Multi-Grained Input-Adaptation
Authors: Zhongzhi Yu, Yonggan Fu, Sicheng Li, Chaojian Li, Yingyan Lin8962-8970
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments and ablation studies validate that the proposed MIA-Former framework can (1) effectively allocate computation budgets adaptive to the difficulty of input images, achieving state-of-the-art (SOTA) accuracy-efficiency trade-offs, e.g., 20% computation savings with the same or even a higher accuracy compared with SOTA dynamic transformer models, and (2) boost Vi Ts robustness accuracy under various adversarial attacks over their static counterparts by 2.4% and 3.0%, respectively. Our code is available at https://github.com/RICE-EIC/MIA-Former. |
| Researcher Affiliation | Collaboration | Zhongzhi Yu1, Yonggan Fu1, Sicheng Li2, Chaojian Li1, Yingyan Lin1 1 Department of Electrical and Computer Engineering, Rice University 2 Alibaba DAMO Academy |
| Pseudocode | No | The paper describes the training process in text but does not include any explicit “pseudocode” or “algorithm” blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/RICE-EIC/MIA-Former. |
| Open Datasets | Yes | We evaluate our proposed MIA-Former over three Vi T models (i.e., Dei T-Small (Touvron et al. 2021), Le Vi T192 and Le Vi T-256 (Graham et al. 2021)) on Image Net-1K dataset (Deng et al. 2009). |
| Dataset Splits | Yes | We evaluate our proposed MIA-Former over three Vi T models (i.e., Dei T-Small (Touvron et al. 2021), Le Vi T192 and Le Vi T-256 (Graham et al. 2021)) on Image Net-1K dataset (Deng et al. 2009). We first summarize the statistical characteristic of the generated skipping policy on the validation set of Image Net1k (Deng et al. 2009). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Adam and AdamW optimizers but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Stage 1: MIA-Controller pretraining: we use an Adam (Kingma and Ba 2014) optimizer with a learning rate of 1e-4 to train the MIA-Controller with fixed MIA-Block until Lpretrain is decreased to 0. Stage 2: MIA-Former co-training: we use an Adam W (Loshchilov and Hutter 2017) optimizer with a batch size of 1024 and a learning rate of 1e-5/1e3 to train the MIA-Block/MIA-Controller, respectively, for 200 epochs. We set the α to 0.1 Lcls Lcost . Stage 3: Skipping policy finetuning with hybrid RL: after inserting the RL agents, we first train the RL agent for 20 epochs with all other parameter fixed and then unfreeze other parameters and co-train the MIA-Former for a total of 50 epochs. |