reproducibilityindex.ai

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

Authors: Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted comprehensive evaluations of EAGLE, including all models from the Vicuna and LLa MA2-Chat series, the Mo E model Mixtral 8x7B Instruct, and tasks in dialogue, code generation, mathematical reasoning, and instruction following.
Researcher Affiliation	Collaboration	1Peking University 2University of Waterloo 3Microsoft Research 4Vector Institute.
Pseudocode	Yes	Algorithm 1 Multi-round speculative sampling
Open Source Code	Yes	The code is available at https://github.com/ SafeAILab/EAGLE.
Open Datasets	Yes	We evaluated EAGLE across multiple tasks including multi-turn dialogue, code generation, mathematical reasoning, and instruction following, employing the MT-bench (Zheng et al., 2023), Human Eval (Chen et al., 2021), GSM8K (Cobbe et al., 2021), and Alpaca (Taori et al., 2023) datasets, respectively. ... EAGLE was trained on the Share GPT dataset, utilizing 68,000 dialogue iterations with a learning rate set at 3e-5.
Dataset Splits	No	The paper mentions using specific datasets for evaluation (MT-bench, Human Eval, GSM8K, Alpaca) and for training (Share GPT), but it does not provide explicit train/validation/test splits (e.g., percentages or counts) for reproduction purposes. These datasets are typically used as test/evaluation sets.
Hardware Specification	Yes	For example, with gpt-fast (Py Torch Labs, 2023), EAGLE accelerates LLa MA2-Chat 7B decoding to 160.4 tokens/s on a single RTX 3090 GPU. ... The training is completed in 1-2 days on 4x A100 (40G) GPUs. ... For Vicuna 7B as the target LLM, operating under a memory constraint of a single RTX 3090 with 24G of CUDA memory... In the case of LLa MA2-Chat 70B, constrained by 4 A100 (40G) GPUs totaling 160G of CUDA memory...
Software Dependencies	No	The paper mentions 'gpt-fast (Py Torch Labs, 2023)' as a tool used in combination with EAGLE, but it does not specify version numbers for Python, PyTorch, CUDA, or other key software components used in the experimental setup.
Experiment Setup	Yes	By integrating regression loss and classification loss, we train the Autoregression Head using the combined loss function L = Lreg + wcls Lcls. Typically, the classification loss is an order of magnitude larger than the regression loss in numerical terms. Consequently, we set wcls to 0.1. ... We employed data augmentation by adding random noise sampled from a uniform distribution U( 0.1, 0.1) to features of the target LLM during training (Jain et al., 2023). ... EAGLE was trained on the Share GPT dataset, utilizing 68,000 dialogue iterations with a learning rate set at 3e-5. We employed the Adam W optimizer with beta values (β1, β2) set to (0.9, 0.95) and implemented gradient clipping of 0.5. ... All evaluations were conducted at FP16 precision.