reproducibilityindex.ai

A Unified Framework for Deep Symbolic Regression

Authors: Mikel Landajuela, Chak Shing Lee, Jiachen Yang, Ruben Glatt, Claudio P Santiago, Ignacio Aravena, Terrell Mundhenk, Garrett Mulcahy, Brenden K Petersen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on empirical evaluation using SRBench, a new community tool for benchmarking symbolic regression methods, our uniﬁed framework achieves state-of-the-art performance in its ability to (1) symbolically recover analytical expressions, (2) ﬁt datasets with high accuracy, and (3) balance accuracy-complexity trade-offs, across 252 ground-truth and black-box benchmark problems, in both noiseless settings and across various noise levels. Finally, we provide practical use case-based guidance for constructing hybrid symbolic regression algorithms, supported by extensive, combinatorial ablation studies.
Researcher Affiliation	Academia	Computational Engineering Division Lawrence Livermore National Laboratory Livermore, CA 94550
Pseudocode	Yes	Algorithm 1 provides high-level pseudocode for u DSR. More detailed pseudocode is available in Algorithm 2 of Appendix A.
Open Source Code	Yes	u DSR source code is provided at https://github.com/brendenpetersen/deep-symbolic-optimization.
Open Datasets	Yes	SRBench features 130 problems with hidden ground-truth analytic solutions and 122 real-world datasets with no known analytic model ( black-box problems) from the PMLB database (Olson et al., 2017). Our benchmarks use the PMLB database, cited in Section 5.
Dataset Splits	No	The paper mentions using SRBench with 130 ground-truth problems and 122 black-box problems, and that SRBench uses 'held-out test data', but it does not specify the train/validation/test split percentages or sample counts for the datasets used in its own experiments. It states that hyperparameters are in Appendix F, but this typically does not include dataset split details.
Hardware Specification	Yes	Computational resources used are described in Appendix J.
Software Dependencies	No	The paper mentions specific software components like 'Sym Py' and 'L-BFGS-B' and 'DSR', but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	For all experiments, we use a minimal set of tokens: +, , , , SIN, COS, EXP, LOG, SQRT, 1.0, CONST, and (except for the appropriate ablations) LINEAR. For simplicity, our choice of Φ (basis functions for LINEAR) includes only polynomial terms up to degree 3. Hyperparameters are listed in Table 3 of Appendix F.