Vision Transformers as Probabilistic Expansion from Learngene
Authors: Qiufeng Wang, Xu Yang, Haokun Chen, Xin Geng
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments demonstrate the effectiveness of PEG and outperforming traditional initialization strategies. |
| Researcher Affiliation | Academia | 1 School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a direct link to a code repository. |
| Open Datasets | Yes | Datasets. After initializing the descendant models with the learngene, we fine-tune them on various downstream tasks, including Oxford Flowers (Nilsback & Zisserman, 2008), CUB-200-2011 (Wah et al., 2011), Stanford Cars (Gebru et al., 2017), CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Food101 (Bossard et al., 2014), i Naturalist-2019 (Tan et al., 2019), Image Net1K (Deng et al., 2009). For detailed dataset descriptions, see Appendix A. |
| Dataset Splits | Yes | Table 6. Characteristics of the downstream datasets. Dataset # Total #Training #Validation #Testing #Classes. Oxford Flowers... 8,189 1,020 1,020 6,149 102 |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., specific GPU/CPU models). |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Training settings. During the learngene expanding phase, we train the learnable parameters for 100 epochs before expanding them into descendant models of elastic scales. After this, we fine-tune these descendant models on downstream tasks for 500 epochs, which includes a 10-epoch warm-up period. The only exception is i Naturalist-2019, where we train for 100 epochs with a 5-epoch warm-up. For all tasks, the initial learning rate is set to 5 10 4 and a weight decay of 0.05 is applied. |