Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models

Authors: Xiaohao Liu, Xiaobo Xia, Weixiang Zhao, Manyi Zhang, Xianzhi Yu, Xiu Su, Shuo Yang, See-Kiong Ng, Tat-Seng Chua

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across diverse benchmarks validate its merit in boosting both LLM performance and inference speed. In this section, we conduct experiments to address the following research question:
Researcher Affiliation	Collaboration	1National University of Singapore 2Harbin Institute of Technology 3Huawei Noah s Ark Lab 4Central South University 5Harbin Institute of Technology, Shenzhen {EMAIL, EMAIL}
Pseudocode	Yes	We also provide the pseudo-code of L-MTP in Appendix B.1.
Open Source Code	Yes	The source code is available at https://github.com/Xiaohao-Liu/L-MTP.
Open Datasets	Yes	We curate the training dataset from Math [53], Evol-Instruct-Code [54, 55], and Alpaca GPT4 [56].
Dataset Splits	Yes	For the second stage, we randomly select 10,000 examples with a ratio of 4:4:2, corresponding to math, code, and general data, respectively. At the continued training stage, we downsample the dataset randomly, where we take 4,000 examples for both code and math datasets and 2,000 examples for the general dataset. Therefore, we prepare 10K examples for continuing to train the model.
Hardware Specification	Yes	All the experiments are conducted on 2 NVIDIA H100-80G GPUs.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or CUDA.
Experiment Setup	Yes	At the head warming up stage, we freeze the LLM backbone while training the heads with a learning rate of 1 10 3 for 5 epochs. We utilize the cosine scheduler and set the warmup ratio as 0.1. At the next stage, we utilize Lo RA [64] with rank being 32 and alpha being 16 to tune the full model. Here we only train the model for 3 epochs with the learning rate being 1 10 5. We set k = 2 and n = 4 by default.