reproducibilityindex.ai

Vision Transformers as Probabilistic Expansion from Learngene

Authors: Qiufeng Wang, Xu Yang, Haokun Chen, Xin Geng

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments demonstrate the effectiveness of PEG and outperforming traditional initialization strategies.
Researcher Affiliation	Academia	1 School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about the release of source code or a direct link to a code repository.
Open Datasets	Yes	Datasets. After initializing the descendant models with the learngene, we fine-tune them on various downstream tasks, including Oxford Flowers (Nilsback & Zisserman, 2008), CUB-200-2011 (Wah et al., 2011), Stanford Cars (Gebru et al., 2017), CIFAR-10 (Krizhevsky et al., 2009), CIFAR-100 (Krizhevsky et al., 2009), Food101 (Bossard et al., 2014), i Naturalist-2019 (Tan et al., 2019), Image Net1K (Deng et al., 2009). For detailed dataset descriptions, see Appendix A.
Dataset Splits	Yes	Table 6. Characteristics of the downstream datasets. Dataset # Total #Training #Validation #Testing #Classes. Oxford Flowers... 8,189 1,020 1,020 6,149 102
Hardware Specification	No	The paper does not specify the hardware used for running the experiments (e.g., specific GPU/CPU models).
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Training settings. During the learngene expanding phase, we train the learnable parameters for 100 epochs before expanding them into descendant models of elastic scales. After this, we fine-tune these descendant models on downstream tasks for 500 epochs, which includes a 10-epoch warm-up period. The only exception is i Naturalist-2019, where we train for 100 epochs with a 5-epoch warm-up. For all tasks, the initial learning rate is set to 5 10 4 and a weight decay of 0.05 is applied.