Cluster-Learngene: Inheriting Adaptive Clusters for Vision Transformers
Authors: Qiufeng Wang, Xu Yang, Fu Feng, Jingq Wang, Xin Geng
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experimentation, we demonstrate that Cluster-Learngene not only is more efficient compared to other initialization methods but also customizes models of elastic scales according to downstream task resources. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University, Ministry of Education, China {qfwang, xuyang_palm, fufeng, wangjing91, xgeng}@seu.edu.cn |
| Pseudocode | Yes | The pseudocode for this phase is presented in Algorithm 1. ... Algorithm 1: Adaptively Cluster for MSA |
| Open Source Code | No | The paper does not provide a direct link to a code repository or explicitly state that the source code for the described methodology is publicly released or available in supplementary materials. While the NeurIPS checklist indicates 'Yes' for code access, this is a self-assessment and not a concrete statement within the paper's content. |
| Open Datasets | Yes | Datasets. To condense the learngene, we employ the Image Net-1K, a collection of 1.2 million training images and 50,000 validation images distributed across 1,000 classes as part of the ILSVRC2012 competition [9]. After initializing the descendant models with the learngene, we proceed to fine-tune these models on diverse downstream tasks. These tasks include i Naturalist-2019 [45], Food101 [4], Oxford Flowers [38], Stanford Cars [12], CIFAR-10 [24], CIFAR-100 [24], CUB-200-2011 [48]. For detailed dataset descriptions, see Appendix A.2. ... Table 6: Characteristics of the downstream datasets (listing '#Training' counts for each dataset). |
| Dataset Splits | Yes | Datasets. To condense the learngene, we employ the Image Net-1K, a collection of 1.2 million training images and 50,000 validation images distributed across 1,000 classes as part of the ILSVRC2012 competition [9]. ... Table 6: Characteristics of the downstream datasets (listing '#Validation' counts for each dataset). |
| Hardware Specification | No | The paper does not specify any particular GPU models, CPU models, or other detailed hardware used for running the experiments. It only mentions 'Architectures. Both the ancestry model and descendant models are variants derived from Dei T'. |
| Software Dependencies | No | The paper mentions architectures like 'Dei T' and 'Swin Transformer' but does not list any specific software libraries or frameworks (e.g., PyTorch, TensorFlow) along with their version numbers. |
| Experiment Setup | Yes | Training settings. During the learngene clustering, We set Eps, ε = 10, Min Hds = 1, ensuring that each attention head is included in a unique cluster. In the learngene inheriting phase, we train the descendant models on downstream tasks for 500 epochs, including a 10-epoch warm-up period, except for i Naturalist-2019, where we train for 100 epochs with a 5-epoch warm-up. The initial learning rate is set to 5 × 10−4 for most tasks, except for Stanford Cars where it is 5 × 10−3, and a weight decay of 0.05. |