Accelerated Speculative Sampling Based on Tree Monte Carlo

Authors: Zhengmian Hu, Heng Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that when a single reference token is used, the average number of accepted tokens for ASp S is same as that of Sp S. This is expected, as the issue with Sp S is that it does not apply maximum coupling across the entire space. When there is only one reference token, the token space is effectively the full space, and hence Sp S already achieves optimal coupling in this context, leaving no room for ASp S to improve upon.
Researcher Affiliation Academia Department of Computer Science, University of Maryland, College Park, USA.
Pseudocode Yes The method defined in the previous section can be translated into pseudo-code in Algorithm 1.
Open Source Code No The paper does not provide a direct link to open-source code or explicitly state that the code is publicly available.
Open Datasets Yes In our experiments, we compare the Speculative Sampling (Sp S) and Accelerated Speculative Sampling (ASp S) methods using LLa Ma-7b model (Touvron et al., 2023) as target model and LLa Ma-68m model (Miao et al., 2023) as reference model, on a translation task from the WMT16 dataset (Bojar et al., 2016).
Dataset Splits No The paper mentions using specific datasets but does not provide explicit details about train/validation/test splits, such as percentages or sample counts.
Hardware Specification No The paper mentions computational complexity and inference speed but does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No Our implementations uses Huggingface library (Wolf et al., 2019). It does not provide specific version numbers for the Huggingface library or other software dependencies.
Experiment Setup Yes In our experiments, we compare the Speculative Sampling (Sp S) and Accelerated Speculative Sampling (ASp S) methods using LLa Ma-7b model (Touvron et al., 2023) as target model and LLa Ma-68m model (Miao et al., 2023) as reference model, on a translation task from the WMT16 dataset (Bojar et al., 2016). The complete experiment results with more tasks and model configurations are moved to Appendix C due to space limitation.