Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Rethinking Symbolic Regression Datasets and Benchmarks for Scientific Discovery

Authors: Yoshitomo Matsubara, Naoya Chiba, Ryo Igarashi, Yoshitaka Ushiku

DMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct benchmark experiments on our new SRSD datasets using various representative SR methods. The experimental results show that we provide a more realistic performance evaluation, and our user study shows that the NED correlates with human judges significantly more than an existing SR metric.
Researcher Affiliation Collaboration Yoshitomo Matsubara EMAIL Amazon Alexa, USA Naoya Chiba EMAIL Tohoku University, Japan Ryo Igarashi EMAIL OMRON SINIC X Corporation, Japan Yoshitaka Ushiku EMAIL OMRON SINIC X Corporation, Japan
Pseudocode No The paper describes methods and procedures in narrative text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes We publish repositories of our code 1 and 240 SRSD datasets.2 3 4 5 6 7 1. https://github.com/omron-sinicx/srsd-benchmark
Open Datasets Yes We publish repositories of our code 1 and 240 SRSD datasets.2 3 4 5 6 7 2. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_easy 3. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_medium 4. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_hard 5. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_easy_dummy 6. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_medium_dummy 7. https://huggingface.co/datasets/yoshitomo-matsubara/srsd-feynman_hard_dummy
Dataset Splits Yes Each of the 120 SRSD datasets consists of 10,000 samples and has train, val, and test splits with ratio of 8:1:1.
Hardware Specification Yes We run 1,680 high performance computing (HPC) jobs in total, using compute nodes in an HPC cluster, which have 5 20 assigned physical CPU cores, 30 120 GB RAM, and 720 GB local storage. ... For E2E, we used an NVIDIA RTX 3090Ti.
Software Dependencies No The paper mentions using Optuna for hyperparameter optimization and sympy for symbolic mathematics, as well as various symbolic regression libraries (gplearn, AFP, AFP-FE, AIF, DSR, E2E, uDSR, PySR). However, it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Tables 19 and 20 show the hyperparameter space for symbolic regression baselines considered in this study. The hyperparameters of gplearn (Koza and Poli, 2005), AFP (Schmidt and Lipson, 2011), and AFP-FE (Schmidt and Lipson, 2009) are optimized by Optuna (Akiba et al., 2019), a hyperparameter optimization framework. For E2E (Kamienny et al., 2022), we reuse the checkpoint of the pretrained model the authors provided. We choose hyperparameters of other methods based on suggestions in their code and/or papers.