Dynamic Token Normalization improves Vision Transformers

Authors: Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, XUYUAN XU, Xiaogang Wang, Ying Shan, Ping Luo

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the transformer equipped with DTN consistently outperforms baseline model with minimal extra parameters and computational overhead.
Researcher Affiliation Collaboration 1 The Chinese University of Hong Kong 2 ARC Lab, Tencent PCG 3 AI Technology Center of Tencent Video 4 The University of Hong Kong
Pseudocode Yes Algorithm 1 Forward pass of DTN.
Open Source Code Yes Codes will be made public at https://github.com/wqshao126/DTN.
Open Datasets Yes Extensive experiment such as image classification on Image Net (Russakovsky et al., 2015), robustness on Image Net-C (Hendrycks & Dietterich, 2019), self-supervised pre-training on Vi Ts (Caron et al., 2021), List Ops on Long-Range Arena (Tay et al., 2021) show that DTN can achieve better performance with minimal extra parameters and marginal increase of computational overhead compared to existing approaches.
Dataset Splits Yes Image Net. We evaluate the performance of our proposed DTN using Vi T models with different sizes on Image Net, which consists of 1.28M training images and 50k validation images.
Hardware Specification No The paper mentions training on "all GPUs" but does not specify exact GPU/CPU models or other detailed hardware specifications.
Software Dependencies No The paper mentions using frameworks like MMDetection but does not provide specific version numbers for any key software components or libraries.
Experiment Setup Yes We train Vi T with our proposed DTN by following the training framework of Dei T (Touvron et al., 2021) where the Vi T models are trained with a total batch size of 1024 on all GPUs. We use Adam optimizer with a momentum of 0.9 and weight decay of 0.05. The cosine learning schedule is adopted with the initial learning rate of 0.0005.