Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Transformer as Linear Expansion of Learngene
Authors: Shiyu Xia, Miaosen Zhang, Xu Yang, Ruiming Chen, Haokun Chen, Xin Geng
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on Image Net-1K demonstrate that TLEG achieves comparable or better performance in contrast to many individual models trained from scratch, while reducing around 2 training cost. When transferring to several downstream classification datasets, TLEG surpasses existing initialization methods by a large margin (e.g., +6.87% on i Nat 2019 and +7.66% on CIFAR-100). |
| Researcher Affiliation | Academia | Shiyu Xia, Miaosen Zhang, Xu Yang*, Ruiming Chen, Haokun Chen, Xin Geng* School of Computer Science and Engineering, Southeast University, Nanjing 210096, China Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Source code is available at https://github.com/Alpha Xia/TLEG. |
| Open Datasets | Yes | We conduct experiments on Image Net-1K (Deng et al. 2009) and several middle/small-scale datasets including i Naturalist 2019 (i Nat 19) (Zhou et al. 2020), Mini-Image Net (Mi-INet) (Vinyals et al. 2016), Tiny-Image Net (Ti INet) (Le and Yang 2015), CIFAR-10 (C-10), CIFAR-100 (C-100) (Krizhevsky, Hinton et al. 2009) and Food-101 (F-101) (Bossard, Guillaumin, and Van Gool 2014). |
| Dataset Splits | No | The paper mentions using standard datasets like ImageNet-1K, CIFAR-10, CIFAR-100, etc., but does not explicitly provide the train/validation/test split percentages or sample counts within the text. |
| Hardware Specification | No | No specific hardware details (like GPU models, CPU types, or memory) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers. |
| Experiment Setup | Yes | For Aux-S and Des-S of 10 different depths, we train Aux-S for 150 epochs and each Des-S for 35 epochs, except that we train 11-layer Des-S for 45 epochs. ... For Aux-B and Des-B of 10 different depths, we train Aux-B for 100 epochs and each Des-B for 40 epochs. ... For Aux-Ti and Des-Ti of 4 different depths, we train Aux-Ti for 150 epochs and each Des-Ti for 50 epochs. ... we introduce one distillation loss: LD = KL(ϕ(zs/τ), ϕ(zt/τ)), ... our total training loss is defined as: L = (1 λ)CE(ϕ(zs), y) + λLD |