Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules

Authors: Binghui Li, Fengling Chen, Zixun Huang, Lean Wang, Lei Wu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, experiments on LLMs ranging from 0.1B to 1B parameters demonstrate the practical relevance of FSL as a surrogate model for fitting and predicting loss trajectories in large-scale pre-training. 6 Experiments 6.1 Power-Law Kernel Regression 6.2 LLM Pre-training
Researcher Affiliation Academia Binghui Li1, Fengling Chen2, Zixun Huang2, Lean Wang3, Lei Wu1,2,4, 1Center for Machine Learning Research, Peking University 2School of Mathematical Sciences, Peking University 3State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University 4AI for Science Institute, Beijing EMAIL, EMAIL EMAIL, EMAIL
Pseudocode No The paper describes mathematical derivations and experimental results but does not include any explicitly labeled pseudocode or algorithm blocks in a structured format.
Open Source Code No 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: While we provide detailed descriptions of all experimental setups and hyperparameters in Section 6 and Appendix C, we do not currently release code or datasets.
Open Datasets No 5. Open access to data and code Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [No] Justification: While we provide detailed descriptions of all experimental setups and hyperparameters in Section 6 and Appendix C, we do not currently release code or datasets.
Dataset Splits No The paper mentions training on '20B tokens' and '10,000 steps' but does not specify how these datasets are split into training, validation, or testing sets, nor does it refer to standard splits with citations.
Hardware Specification No 8. Experiments compute resources Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [No] Justification: While we describe the experimental setup in full detail, we do not currently report the specific compute resources used.
Software Dependencies No 9. Code of ethics [Answer: No for this section in the paper checklist. Assuming it means No for software dependencies, as it is the closest related checklist item.]
Experiment Setup Yes In each experiment, we adopt a PLKR configuration with M = N = 128, σ = 3 and employ the top-M projection matrix... For each parameter configuration, we execute 200 independent SGD runs with a batch size of 1 over 10,000 steps. We fit FSL on the loss curve of a 1B Qwen Mo E model trained on 20B tokens with batch size 288, maximum learning rate η0 = 0.001, and the 8-1-1 scheduler over a total step of K = 33907.