Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Authors: Xiaohao Liu, Xiaobo Xia, Weixiang Zhao, Manyi Zhang, Xianzhi Yu, Xiu Su, Shuo Yang, See-Kiong Ng, Tat-Seng Chua
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments across diverse benchmarks validate its merit in boosting both LLM performance and inference speed. In this section, we conduct experiments to address the following research question: |
| Researcher Affiliation | Collaboration | 1National University of Singapore 2Harbin Institute of Technology 3Huawei Noah s Ark Lab 4Central South University 5Harbin Institute of Technology, Shenzhen {EMAIL, EMAIL} |
| Pseudocode | Yes | We also provide the pseudo-code of L-MTP in Appendix B.1. |
| Open Source Code | Yes | The source code is available at https://github.com/Xiaohao-Liu/L-MTP. |
| Open Datasets | Yes | We curate the training dataset from Math [53], Evol-Instruct-Code [54, 55], and Alpaca GPT4 [56]. |
| Dataset Splits | Yes | For the second stage, we randomly select 10,000 examples with a ratio of 4:4:2, corresponding to math, code, and general data, respectively. At the continued training stage, we downsample the dataset randomly, where we take 4,000 examples for both code and math datasets and 2,000 examples for the general dataset. Therefore, we prepare 10K examples for continuing to train the model. |
| Hardware Specification | Yes | All the experiments are conducted on 2 NVIDIA H100-80G GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | At the head warming up stage, we freeze the LLM backbone while training the heads with a learning rate of 1 10 3 for 5 epochs. We utilize the cosine scheduler and set the warmup ratio as 0.1. At the next stage, we utilize Lo RA [64] with rank being 32 and alpha being 16 to tune the full model. Here we only train the model for 3 epochs with the learning rate being 1 10 5. We set k = 2 and n = 4 by default. |