A Unified Framework for Deep Symbolic Regression
Authors: Mikel Landajuela, Chak Shing Lee, Jiachen Yang, Ruben Glatt, Claudio P Santiago, Ignacio Aravena, Terrell Mundhenk, Garrett Mulcahy, Brenden K Petersen
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Based on empirical evaluation using SRBench, a new community tool for benchmarking symbolic regression methods, our unified framework achieves state-of-the-art performance in its ability to (1) symbolically recover analytical expressions, (2) fit datasets with high accuracy, and (3) balance accuracy-complexity trade-offs, across 252 ground-truth and black-box benchmark problems, in both noiseless settings and across various noise levels. Finally, we provide practical use case-based guidance for constructing hybrid symbolic regression algorithms, supported by extensive, combinatorial ablation studies. |
| Researcher Affiliation | Academia | Computational Engineering Division Lawrence Livermore National Laboratory Livermore, CA 94550 |
| Pseudocode | Yes | Algorithm 1 provides high-level pseudocode for u DSR. More detailed pseudocode is available in Algorithm 2 of Appendix A. |
| Open Source Code | Yes | u DSR source code is provided at https://github.com/brendenpetersen/deep-symbolic-optimization. |
| Open Datasets | Yes | SRBench features 130 problems with hidden ground-truth analytic solutions and 122 real-world datasets with no known analytic model ( black-box problems) from the PMLB database (Olson et al., 2017). Our benchmarks use the PMLB database, cited in Section 5. |
| Dataset Splits | No | The paper mentions using SRBench with 130 ground-truth problems and 122 black-box problems, and that SRBench uses 'held-out test data', but it does not specify the train/validation/test split percentages or sample counts for the datasets used in its own experiments. It states that hyperparameters are in Appendix F, but this typically does not include dataset split details. |
| Hardware Specification | Yes | Computational resources used are described in Appendix J. |
| Software Dependencies | No | The paper mentions specific software components like 'Sym Py' and 'L-BFGS-B' and 'DSR', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For all experiments, we use a minimal set of tokens: +, , , , SIN, COS, EXP, LOG, SQRT, 1.0, CONST, and (except for the appropriate ablations) LINEAR. For simplicity, our choice of Φ (basis functions for LINEAR) includes only polynomial terms up to degree 3. Hyperparameters are listed in Table 3 of Appendix F. |