reproducibilityindex.ai

AmoebaLLM: Constructing Any-Shape Large Language Models for Efficient and Instant Deployment

Authors: Yonggan Fu, Zhongzhi Yu, Junwei Li, Jiayi Qian, Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, Yingyan (Celine) Lin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate that Amoeba LLM not only sets new standards in LLM adaptability but also successfully delivers subnets that achieve stateof-the-art trade-offs between accuracy and efficiency.
Researcher Affiliation	Academia	Yonggan Fu, Zhongzhi Yu , Junwei Li , Jiayi Qian , Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, Yingyan (Celine) Lin Georgia Institute of Technology {yonggan.fu, celine.lin}@gatech.edu
Pseudocode	No	The paper describes methodologies in text and figures but does not include structured pseudocode or algorithm blocks with explicit labels like 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	Our code is available at https://github.com/GATECH-EIC/Amoeba LLM.
Open Datasets	Yes	Following [7, 9], we adopt 50K samples from Alpaca [40] for our one-for-all fine-tuning as well as for fine-tuning all baselines. For both our method and the baselines, we adopt a constant learning rate of 2e-4 with an Adam W optimizer and a Lo RA rank of 64, and fine-tune for 10K iterations.
Dataset Splits	Yes	During each fine-tuning iteration, we employ the sandwich sampling [11, 13, 14] to sample K subnets {Ti}K i=1 with different layer/width remaining ratios, including the largest/smallest ones and K 2 random ones from our design space. Detailed layer/width configurations of sampled subsets can be obtained from the strategies derived in Sec. 3.2. We fine-tune our SMo L adapter as detailed in Sec. 3.3 by accumulating the gradients from all sampled subnets using in-place distillation, where only the loss of the largest subnet T1 is calculated using ground truth, while those of other subnets {Ti}K i=2 use distillation from the largest one [11].
Hardware Specification	Yes	We profile these workloads using (1) two devices, including an NVIDIA A5000 consumer-level GPU and an NVIDIA Jetson Orin NX edge GPU;
Software Dependencies	No	The paper mentions 'Tensor RT-LLM [19], MLC-LLM [20], and vanilla Py Torch [21]' as deployment flows but does not specify their version numbers or other software dependencies with versions.
Experiment Setup	Yes	For both our method and the baselines, we adopt a constant learning rate of 2e-4 with an Adam W optimizer and a Lo RA rank of 64, and fine-tune for 10K iterations. It takes 40 GPU hours on an NVIDIA A5000 GPU for our one-for-all fine-tuning.