Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment
Authors: Yonggan Fu, Zhongzhi Yu, Junwei Li, Jiayi Qian, Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, Yingyan (Celine) Lin
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments validate that Amoeba LLM not only sets new standards in LLM adaptability but also successfully delivers subnets that achieve stateof-the-art trade-offs between accuracy and efficiency. |
| Researcher Affiliation | Academia | Yonggan Fu, Zhongzhi Yu , Junwei Li , Jiayi Qian , Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, Yingyan (Celine) Lin Georgia Institute of Technology EMAIL |
| Pseudocode | No | The paper describes methodologies in text and figures but does not include structured pseudocode or algorithm blocks with explicit labels like 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | Our code is available at https://github.com/GATECH-EIC/Amoeba LLM. |
| Open Datasets | Yes | Following [7, 9], we adopt 50K samples from Alpaca [40] for our one-for-all fine-tuning as well as for fine-tuning all baselines. For both our method and the baselines, we adopt a constant learning rate of 2e-4 with an Adam W optimizer and a Lo RA rank of 64, and fine-tune for 10K iterations. |
| Dataset Splits | Yes | During each fine-tuning iteration, we employ the sandwich sampling [11, 13, 14] to sample K subnets {Ti}K i=1 with different layer/width remaining ratios, including the largest/smallest ones and K 2 random ones from our design space. Detailed layer/width configurations of sampled subsets can be obtained from the strategies derived in Sec. 3.2. We fine-tune our SMo L adapter as detailed in Sec. 3.3 by accumulating the gradients from all sampled subnets using in-place distillation, where only the loss of the largest subnet T1 is calculated using ground truth, while those of other subnets {Ti}K i=2 use distillation from the largest one [11]. |
| Hardware Specification | Yes | We profile these workloads using (1) two devices, including an NVIDIA A5000 consumer-level GPU and an NVIDIA Jetson Orin NX edge GPU; |
| Software Dependencies | No | The paper mentions 'Tensor RT-LLM [19], MLC-LLM [20], and vanilla Py Torch [21]' as deployment flows but does not specify their version numbers or other software dependencies with versions. |
| Experiment Setup | Yes | For both our method and the baselines, we adopt a constant learning rate of 2e-4 with an Adam W optimizer and a Lo RA rank of 64, and fine-tune for 10K iterations. It takes 40 GPU hours on an NVIDIA A5000 GPU for our one-for-all fine-tuning. |