Accelerated Speculative Sampling Based on Tree Monte Carlo
Authors: Zhengmian Hu, Heng Huang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show that when a single reference token is used, the average number of accepted tokens for ASp S is same as that of Sp S. This is expected, as the issue with Sp S is that it does not apply maximum coupling across the entire space. When there is only one reference token, the token space is effectively the full space, and hence Sp S already achieves optimal coupling in this context, leaving no room for ASp S to improve upon. |
| Researcher Affiliation | Academia | Department of Computer Science, University of Maryland, College Park, USA. |
| Pseudocode | Yes | The method defined in the previous section can be translated into pseudo-code in Algorithm 1. |
| Open Source Code | No | The paper does not provide a direct link to open-source code or explicitly state that the code is publicly available. |
| Open Datasets | Yes | In our experiments, we compare the Speculative Sampling (Sp S) and Accelerated Speculative Sampling (ASp S) methods using LLa Ma-7b model (Touvron et al., 2023) as target model and LLa Ma-68m model (Miao et al., 2023) as reference model, on a translation task from the WMT16 dataset (Bojar et al., 2016). |
| Dataset Splits | No | The paper mentions using specific datasets but does not provide explicit details about train/validation/test splits, such as percentages or sample counts. |
| Hardware Specification | No | The paper mentions computational complexity and inference speed but does not provide specific hardware details (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | Our implementations uses Huggingface library (Wolf et al., 2019). It does not provide specific version numbers for the Huggingface library or other software dependencies. |
| Experiment Setup | Yes | In our experiments, we compare the Speculative Sampling (Sp S) and Accelerated Speculative Sampling (ASp S) methods using LLa Ma-7b model (Touvron et al., 2023) as target model and LLa Ma-68m model (Miao et al., 2023) as reference model, on a translation task from the WMT16 dataset (Bojar et al., 2016). The complete experiment results with more tasks and model configurations are moved to Appendix C due to space limitation. |