reproducibilityindex.ai

Dynamic Token Normalization improves Vision Transformers

Authors: Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, XUYUAN XU, Xiaogang Wang, Ying Shan, Ping Luo

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that the transformer equipped with DTN consistently outperforms baseline model with minimal extra parameters and computational overhead.
Researcher Affiliation	Collaboration	1 The Chinese University of Hong Kong 2 ARC Lab, Tencent PCG 3 AI Technology Center of Tencent Video 4 The University of Hong Kong
Pseudocode	Yes	Algorithm 1 Forward pass of DTN.
Open Source Code	Yes	Codes will be made public at https://github.com/wqshao126/DTN.
Open Datasets	Yes	Extensive experiment such as image classiﬁcation on Image Net (Russakovsky et al., 2015), robustness on Image Net-C (Hendrycks & Dietterich, 2019), self-supervised pre-training on Vi Ts (Caron et al., 2021), List Ops on Long-Range Arena (Tay et al., 2021) show that DTN can achieve better performance with minimal extra parameters and marginal increase of computational overhead compared to existing approaches.
Dataset Splits	Yes	Image Net. We evaluate the performance of our proposed DTN using Vi T models with different sizes on Image Net, which consists of 1.28M training images and 50k validation images.
Hardware Specification	No	The paper mentions training on "all GPUs" but does not specify exact GPU/CPU models or other detailed hardware specifications.
Software Dependencies	No	The paper mentions using frameworks like MMDetection but does not provide specific version numbers for any key software components or libraries.
Experiment Setup	Yes	We train Vi T with our proposed DTN by following the training framework of Dei T (Touvron et al., 2021) where the Vi T models are trained with a total batch size of 1024 on all GPUs. We use Adam optimizer with a momentum of 0.9 and weight decay of 0.05. The cosine learning schedule is adopted with the initial learning rate of 0.0005.