Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Monte Carlo Tree Search for Symbolic Regression

Authors: Zhengyao Huang, Daniel Huang, Tiannan Xiao, Dina Ma, Zhenyu Ming, Hao Shi, Yuanhui Wen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a thorough numerical study to the impact of these improvements and benchmark our approach against existing symbolic regression methods on a variety of datasets, including both ground-truth and black-box datasets. Our approach achieves competitive performance with state-of-the-art libraries in terms of recovery rate, attains favorable positions on the Pareto frontier of accuracy versus model complexity.
Researcher Affiliation	Collaboration	Center for Machine Learning Research, Peking University, Beijing, China. Corresponding author; Beijing International Center for Mathematical Research, Center for Machine Learning Research, Peking University, Beijing, China. Huawei Technologies Ltd., Beijing, China. Department of Mathematical Sciences, Tsinghua University, Beijing, China.
Pseudocode	Yes	Detailed pseudocode of the algorithm is provided in Algorithm 1. (Algorithm 1: Improved MCTS, Algorithm 2: Backward Propagation, Algorithm 3: Forward Propagation)
Open Source Code	Yes	Code is available at https://github.com/PKU-CMEGroup/MCTS-4-SR.
Open Datasets	Yes	The Basic Benchmarks include several ground-truth datasets where the true closed-form expressions are known: Nguyen [16], Nguyen C [16], Jin [48], and Livermore [23]. The SRBench Black-box Benchmarks (SRBench) [10, 49] feature more challenging datasets: Feynman [17], Strogatz [50], and the Black-box collection.
Dataset Splits	Yes	Each dataset is split into training and testing subsets (75%/25%) using a fixed random seed.
Hardware Specification	Yes	All experiments were conducted on machines delivering 10.6 TFLOPS of FP32 compute performance and 256GB RAM.
Software Dependencies	No	The paper mentions software like SciPy [52] and Sympy [56] but does not specify any version numbers for these or any other software components.
Experiment Setup	Yes	The hyperparameter configurations used in the comparative study are summarized in Table 4. Note that while the values of ps, ϵ, and the maximum expression evaluation budget vary in Appendix F, all other settings and experimental conditions remain consistent.