reproducibilityindex.ai

Linearly Decomposing and Recomposing Vision Transformers for Diverse-Scale Models

Authors: Shuxia Lin, Miaosen Zhang, Ruiming Chen, Xu Yang, Qiufeng Wang, Xin Geng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments are used to validate the effectiveness of our method: Vi Ts can be decomposed and the decomposed learngenes can be recomposed into diverse-scale Vi Ts, which can achieve comparable or better performance compared to traditional model compression and pre-training methods. The code for our experiments is available in the supplemental material.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Ministry of Education, China {shuxialin, 230228501, 220232251, qfwang, xuyang_palm, xgeng}@seu.edu.cn
Pseudocode	No	The paper describes its method through textual descriptions and a pipeline diagram (Figure 3). It does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks with structured steps.
Open Source Code	Yes	The code for our experiments is available in the supplemental material.
Open Datasets	Yes	To train each learngene, we use Image Net-1K [12], which contains approximately 1.2M training images across 1000 classes and 50K validation images. After recomposing Vi T models of different layers with learngenes, we adapt them on 9 diverse downstream datasets, which include 3 object classification tasks: CIFAR-10 [26], CIFAR-100 [26], and Tiny-Image Net [1]; 5 fine-grained classification tasks: i Naturalist-2019 [66], Food-101 [4], Oxford Flowers-102 [35], Stanford Cars [25], and Oxford-IIIT Pets [38]; 1 texture classification task: DTD [10].
Dataset Splits	Yes	To train each learngene, we use Image Net-1K [12], which contains approximately 1.2M training images across 1000 classes and 50K validation images.
Hardware Specification	Yes	The decomposition process is trained over 500 epochs on four NVIDIA RTX 3090 GPUs, and the recomposed models are trained over 100 epochs on two NVIDIA RTX 3090 GPUs for each downstream task.
Software Dependencies	No	We implement the model using Py Torch [39] and the Timm library [53]. The paper mentions these software components but does not provide specific version numbers for them.
Experiment Setup	Yes	Batch Size: We employ distributed training across 4 GPUs, with each GPU handling 128 data instances, resulting in an overall batch size of 512. Optimizer: The training of each learngene is optimized using Adam W, with an initial learning rate of 0.0008 and a weight decay of 0.05. Learning Rate Schedule: We apply a cosine learning rate decay, with a warm-up period of 5 epochs. (Similar details are provided for Recomposition training settings).