Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Dynamic Token Normalization improves Vision Transformers
Authors: Wenqi Shao, Yixiao Ge, Zhaoyang Zhang, XUYUAN XU, Xiaogang Wang, Ying Shan, Ping Luo
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the transformer equipped with DTN consistently outperforms baseline model with minimal extra parameters and computational overhead. |
| Researcher Affiliation | Collaboration | 1 The Chinese University of Hong Kong 2 ARC Lab, Tencent PCG 3 AI Technology Center of Tencent Video 4 The University of Hong Kong |
| Pseudocode | Yes | Algorithm 1 Forward pass of DTN. |
| Open Source Code | Yes | Codes will be made public at https://github.com/wqshao126/DTN. |
| Open Datasets | Yes | Extensive experiment such as image classification on Image Net (Russakovsky et al., 2015), robustness on Image Net-C (Hendrycks & Dietterich, 2019), self-supervised pre-training on Vi Ts (Caron et al., 2021), List Ops on Long-Range Arena (Tay et al., 2021) show that DTN can achieve better performance with minimal extra parameters and marginal increase of computational overhead compared to existing approaches. |
| Dataset Splits | Yes | Image Net. We evaluate the performance of our proposed DTN using Vi T models with different sizes on Image Net, which consists of 1.28M training images and 50k validation images. |
| Hardware Specification | No | The paper mentions training on "all GPUs" but does not specify exact GPU/CPU models or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions using frameworks like MMDetection but does not provide specific version numbers for any key software components or libraries. |
| Experiment Setup | Yes | We train Vi T with our proposed DTN by following the training framework of Dei T (Touvron et al., 2021) where the Vi T models are trained with a total batch size of 1024 on all GPUs. We use Adam optimizer with a momentum of 0.9 and weight decay of 0.05. The cosine learning schedule is adopted with the initial learning rate of 0.0005. |