Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Flextron: Many-in-One Flexible Large Language Model
Authors: Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate FLEXTRON on the GPT-3 and LLama-2 family of LLMs, and demonstrate superior performance over multiple end-to-end trained variants and other state-of-the-art elastic networks, all with a single pretraining run that consumes a mere 7.63% tokens compared to original pretraining. |
| Researcher Affiliation | Collaboration | Ruisi Cai 1 2 Saurav Muralidharan 1 Greg Heinrich 1 Hongxu Yin 1 Zhangyang Wang 2 Jan Kautz 1 Pavlo Molchanov 1 1NVIDIA 2The University of Texas at Austin. |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled as 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper mentions Tensor RT-LLM with a GitHub link, but this is a third-party tool used for measurement, not the authors' open-source code for FLEXTRON. No explicit statement about making FLEXTRON's code available was found. |
| Open Datasets | Yes | We perform our evaluation on the GPT-3 and Llama-2 (Touvron et al., 2023) model families. GPT-3 ... trained on 1.1 trillion tokens, where data is obtained from publicly available data sources, comprising 53 languages and code. ... We further validate our approach using the Llama2-7B model (Touvron et al., 2023)... We additionally compare our method with representative open-source model families, including Pythia (Biderman et al., 2023), Open LLa MA (Geng & Liu, 2023)... Foundation, W. Wikimedia downloads. URL https://dumps.wikimedia.org. |
| Dataset Splits | No | The paper mentions using a 'validation loss' in Appendix A and 'validation step' in Section 4.1, implying a validation set was used, but it does not specify the explicit train/validation/test dataset splits by percentage or count in the main text or experimental settings. |
| Hardware Specification | Yes | All results are tested on the NVIDIA A100 80GB GPU, with latency measured when the prompting length and generation length is set to 8 and 512, respectively. We use the batch size of 1. |
| Software Dependencies | No | The paper mentions 'Ne Mo framework' and 'Tensor RT-LLM' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | As described in Section 3.1, during elastic network pretraining, we first perform importance sorting of each head/neuron in MHA/MLP layers using a tiny fraction (512 samples) of the full training set . We then perform training of the sorted and permuted elastic model. We use a batch-size of 256, and tune the model for 80000 steps. At each step, we randomly construct 3 sub-models together with the full model; perform gradient accumulation for all 4 models for a single update. We perform lightweight tuning for automatic network selection: we freeze the backbone parameters and only tune the routers and surrogate models for 1000 steps using a batch size of 256. |