MIA-Former: Efficient and Robust Vision Transformers via Multi-Grained Input-Adaptation

Authors: Zhongzhi Yu, Yonggan Fu, Sicheng Li, Chaojian Li, Yingyan Lin8962-8970

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments and ablation studies validate that the proposed MIA-Former framework can (1) effectively allocate computation budgets adaptive to the difficulty of input images, achieving state-of-the-art (SOTA) accuracy-efficiency trade-offs, e.g., 20% computation savings with the same or even a higher accuracy compared with SOTA dynamic transformer models, and (2) boost Vi Ts robustness accuracy under various adversarial attacks over their static counterparts by 2.4% and 3.0%, respectively. Our code is available at https://github.com/RICE-EIC/MIA-Former.
Researcher Affiliation Collaboration Zhongzhi Yu1, Yonggan Fu1, Sicheng Li2, Chaojian Li1, Yingyan Lin1 1 Department of Electrical and Computer Engineering, Rice University 2 Alibaba DAMO Academy
Pseudocode No The paper describes the training process in text but does not include any explicit “pseudocode” or “algorithm” blocks.
Open Source Code Yes Our code is available at https://github.com/RICE-EIC/MIA-Former.
Open Datasets Yes We evaluate our proposed MIA-Former over three Vi T models (i.e., Dei T-Small (Touvron et al. 2021), Le Vi T192 and Le Vi T-256 (Graham et al. 2021)) on Image Net-1K dataset (Deng et al. 2009).
Dataset Splits Yes We evaluate our proposed MIA-Former over three Vi T models (i.e., Dei T-Small (Touvron et al. 2021), Le Vi T192 and Le Vi T-256 (Graham et al. 2021)) on Image Net-1K dataset (Deng et al. 2009). We first summarize the statistical characteristic of the generated skipping policy on the validation set of Image Net1k (Deng et al. 2009).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using Adam and AdamW optimizers but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Stage 1: MIA-Controller pretraining: we use an Adam (Kingma and Ba 2014) optimizer with a learning rate of 1e-4 to train the MIA-Controller with fixed MIA-Block until Lpretrain is decreased to 0. Stage 2: MIA-Former co-training: we use an Adam W (Loshchilov and Hutter 2017) optimizer with a batch size of 1024 and a learning rate of 1e-5/1e3 to train the MIA-Block/MIA-Controller, respectively, for 200 epochs. We set the α to 0.1 Lcls Lcost . Stage 3: Skipping policy finetuning with hybrid RL: after inserting the RL agents, we first train the RL agent for 20 epochs with all other parameter fixed and then unfreeze other parameters and co-train the MIA-Former for a total of 50 epochs.