Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers

Authors: Qiufeng Wang, Xu Yang, Fu Feng, Jingq Wang, Xin Geng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experimentation, we demonstrate that Cluster-Learngene not only is more efficient compared to other initialization methods but also customizes models of elastic scales according to downstream task resources.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Ministry of Education, China {qfwang, xuyang_palm, fufeng, wangjing91, xgeng}@seu.edu.cn
Pseudocode Yes The pseudocode for this phase is presented in Algorithm 1. ... Algorithm 1: Adaptively Cluster for MSA
Open Source Code No The paper does not provide a direct link to a code repository or explicitly state that the source code for the described methodology is publicly released or available in supplementary materials. While the NeurIPS checklist indicates 'Yes' for code access, this is a self-assessment and not a concrete statement within the paper's content.
Open Datasets Yes Datasets. To condense the learngene, we employ the Image Net-1K, a collection of 1.2 million training images and 50,000 validation images distributed across 1,000 classes as part of the ILSVRC2012 competition [9]. After initializing the descendant models with the learngene, we proceed to fine-tune these models on diverse downstream tasks. These tasks include i Naturalist-2019 [45], Food101 [4], Oxford Flowers [38], Stanford Cars [12], CIFAR-10 [24], CIFAR-100 [24], CUB-200-2011 [48]. For detailed dataset descriptions, see Appendix A.2. ... Table 6: Characteristics of the downstream datasets (listing '#Training' counts for each dataset).
Dataset Splits Yes Datasets. To condense the learngene, we employ the Image Net-1K, a collection of 1.2 million training images and 50,000 validation images distributed across 1,000 classes as part of the ILSVRC2012 competition [9]. ... Table 6: Characteristics of the downstream datasets (listing '#Validation' counts for each dataset).
Hardware Specification No The paper does not specify any particular GPU models, CPU models, or other detailed hardware used for running the experiments. It only mentions 'Architectures. Both the ancestry model and descendant models are variants derived from Dei T'.
Software Dependencies No The paper mentions architectures like 'Dei T' and 'Swin Transformer' but does not list any specific software libraries or frameworks (e.g., PyTorch, TensorFlow) along with their version numbers.
Experiment Setup Yes Training settings. During the learngene clustering, We set Eps, ε = 10, Min Hds = 1, ensuring that each attention head is included in a unique cluster. In the learngene inheriting phase, we train the descendant models on downstream tasks for 500 epochs, including a 10-epoch warm-up period, except for i Naturalist-2019, where we train for 100 epochs with a 5-epoch warm-up. The initial learning rate is set to 5 × 10−4 for most tasks, except for Stanford Cars where it is 5 × 10−3, and a weight decay of 0.05.