Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Structured Initialization for Vision Transformers
Authors: Jianqiao Zheng, Xueqian Li, Hemanth Saratchandran, Simon Lucey
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that our method significantly outperforms standard Vi T initialization across numerous small and medium-scale benchmarks, including Food-101, CIFAR-10, CIFAR-100, STL-10, Flowers, and Pets, while maintaining comparative performance on large-scale datasets such as Image Net-1K. Moreover, our initialization strategy can be easily integrated into various transformer-based architectures such as Swin Transformer and MLP-Mixer with consistent improvements in performance. |
| Researcher Affiliation | Academia | Jianqiao Zheng Xueqian Li Hemanth Saratchandran Simon Lucey Australian Institute for Machine Learning The University of Adelaide EMAIL. |
| Pseudocode | Yes | The pseudo code for our initialization strategy can be found in the appendix. (Appendix D, Algorithm 1 Convolutional Structured Impulse Initialization for Vi T) |
| Open Source Code | Yes | Code is available at https://github.com/osiriszjq/structured_ initialization |
| Open Datasets | Yes | All the data used in the paper are open-sourced datasets. We have provided the implementation details and have released the code to reproduce the results claimed in our paper. |
| Dataset Splits | Yes | Experiments are conducted on medium-scale datasets ( 50K training images), including Food-101 [2], CIFAR-10, and CIFAR-100 [13], as well as small-scale datasets ( 5K training images), such as STL-10 [4], Flowers [19], and Pets [22]. We follow the training recipe from [33], which is proven to be useful in training Vi T-Tiny on small and medium-scale datasets. |
| Hardware Specification | Yes | Note that all experiments were conducted on a single node with 8 Tesla V100 SXM3 GPUs, each with 32GB of memory, if not specified. Specifically, all the experiments on the small-scale datasets took about three hours to train each model, while the experiments on the Image Net-1K took about two days. |
| Software Dependencies | No | Their codes are based on the timm library [31]. By default, all the weights are initialized with a truncated normal distribution. For a fair comparison, all the experiments were run with identical codes except for the choice of initialization methods. |
| Experiment Setup | Yes | We follow the training recipe from [33], which is proven to be useful in training Vi T-Tiny on small and medium-scale datasets. Their codes are based on the timm library [31]. By default, all the weights are initialized with a truncated normal distribution. For a fair comparison, all the experiments were run with identical codes except for the choice of initialization methods. Please also note that although [33] initializes the model weights from large pretrained models, we did not adopt any pretraining step for this experiment. In contrast, we apply default, mimetic, and our impulse initialization methods to Vi T-Tiny models and training these models from scratch. |