Vision Transformers as Probabilistic Expansion from Learngene

Authors: Qiufeng Wang, Xu Yang, Haokun Chen, Xin Geng

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments demonstrate the effectiveness of PEG and outperforming traditional initialization strategies.
Researcher Affiliation Academia 1 School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about the release of source code or a direct link to a code repository.
Open Datasets Yes Datasets. After initializing the descendant models with the learngene, we fine-tune them on various downstream tasks, including Oxford Flowers (Nilsback & Zisserman, 2008), CUB-200-2011 (Wah et al., 2011), Stanford Cars (Gebru et al., 2017), CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Food101 (Bossard et al., 2014), i Naturalist-2019 (Tan et al., 2019), Image Net1K (Deng et al., 2009). For detailed dataset descriptions, see Appendix A.
Dataset Splits Yes Table 6. Characteristics of the downstream datasets. Dataset # Total #Training #Validation #Testing #Classes. Oxford Flowers... 8,189 1,020 1,020 6,149 102
Hardware Specification No The paper does not specify the hardware used for running the experiments (e.g., specific GPU/CPU models).
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Training settings. During the learngene expanding phase, we train the learnable parameters for 100 epochs before expanding them into descendant models of elastic scales. After this, we fine-tune these descendant models on downstream tasks for 500 epochs, which includes a 10-epoch warm-up period. The only exception is i Naturalist-2019, where we train for 100 epochs with a 5-epoch warm-up. For all tasks, the initial learning rate is set to 5 10 4 and a weight decay of 0.05 is applied.