Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Monte Carlo Tree Search for Symbolic Regression

Authors: Zhengyao Huang, Daniel Huang, Tiannan Xiao, Dina Ma, Zhenyu Ming, Hao Shi, Yuanhui Wen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct a thorough numerical study to the impact of these improvements and benchmark our approach against existing symbolic regression methods on a variety of datasets, including both ground-truth and black-box datasets. Our approach achieves competitive performance with state-of-the-art libraries in terms of recovery rate, attains favorable positions on the Pareto frontier of accuracy versus model complexity.
Researcher Affiliation Collaboration Center for Machine Learning Research, Peking University, Beijing, China. Corresponding author; Beijing International Center for Mathematical Research, Center for Machine Learning Research, Peking University, Beijing, China. Huawei Technologies Ltd., Beijing, China. Department of Mathematical Sciences, Tsinghua University, Beijing, China.
Pseudocode Yes Detailed pseudocode of the algorithm is provided in Algorithm 1. (Algorithm 1: Improved MCTS, Algorithm 2: Backward Propagation, Algorithm 3: Forward Propagation)
Open Source Code Yes Code is available at https://github.com/PKU-CMEGroup/MCTS-4-SR.
Open Datasets Yes The Basic Benchmarks include several ground-truth datasets where the true closed-form expressions are known: Nguyen [16], Nguyen C [16], Jin [48], and Livermore [23]. The SRBench Black-box Benchmarks (SRBench) [10, 49] feature more challenging datasets: Feynman [17], Strogatz [50], and the Black-box collection.
Dataset Splits Yes Each dataset is split into training and testing subsets (75%/25%) using a fixed random seed.
Hardware Specification Yes All experiments were conducted on machines delivering 10.6 TFLOPS of FP32 compute performance and 256GB RAM.
Software Dependencies No The paper mentions software like SciPy [52] and Sympy [56] but does not specify any version numbers for these or any other software components.
Experiment Setup Yes The hyperparameter configurations used in the comparative study are summarized in Table 4. Note that while the values of ps, ϵ, and the maximum expression evaluation budget vary in Appendix F, all other settings and experimental conditions remain consistent.