Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery

Authors: Yoshitomo Matsubara, Naoya Chiba, Ryo Igarashi, Yoshitaka Ushiku

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct benchmark experiments on our new SRSD datasets using various representative SR methods. The experimental results show that we provide a more realistic performance evaluation, and our user study shows that the NED correlates with human judges significantly more than an existing SR metric.
Researcher Affiliation	Collaboration	Yoshitomo Matsubara EMAIL Amazon Alexa, USA Naoya Chiba EMAIL Tohoku University, Japan Ryo Igarashi EMAIL OMRON SINIC X Corporation, Japan Yoshitaka Ushiku EMAIL OMRON SINIC X Corporation, Japan
Pseudocode	No	The paper describes methods and procedures in narrative text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We publish repositories of our code 1 and 240 SRSD datasets.2 3 4 5 6 7 1. https://github.com/omron-sinicx/srsd-benchmark
Open Datasets	Yes	We publish repositories of our code 1 and 240 SRSD datasets.2 3 4 5 6 7 2. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_easy 3. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_medium 4. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_hard 5. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_easy_dummy 6. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_medium_dummy 7. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_hard_dummy
Dataset Splits	Yes	Each of the 120 SRSD datasets consists of 10,000 samples and has train, val, and test splits with ratio of 8:1:1.
Hardware Specification	Yes	We run 1,680 high performance computing (HPC) jobs in total, using compute nodes in an HPC cluster, which have 5 20 assigned physical CPU cores, 30 120 GB RAM, and 720 GB local storage. ... For E2E, we used an NVIDIA RTX 3090Ti.
Software Dependencies	No	The paper mentions using Optuna for hyperparameter optimization and sympy for symbolic mathematics, as well as various symbolic regression libraries (gplearn, AFP, AFP-FE, AIF, DSR, E2E, uDSR, PySR). However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Tables 19 and 20 show the hyperparameter space for symbolic regression baselines considered in this study. The hyperparameters of gplearn (Koza and Poli, 2005), AFP (Schmidt and Lipson, 2011), and AFP-FE (Schmidt and Lipson, 2009) are optimized by Optuna (Akiba et al., 2019), a hyperparameter optimization framework. For E2E (Kamienny et al., 2022), we reuse the checkpoint of the pretrained model the authors provided. We choose hyperparameters of other methods based on suggestions in their code and/or papers.