reproducibilityindex.ai

Transformer-based Planning for Symbolic Regression

Authors: Parshin Shojaee, Kazem Meidani, Amir Barati Farimani, Chandan Reddy

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on various datasets show that our approach outperforms state-of-the-art methods, enhancing the model s fitting-complexity trade-off, extrapolation abilities, and robustness to noise.
Researcher Affiliation	Academia	Parshin Shojaee 1 , Kazem Meidani 2, Amir Barati Farimani 2,3 , Chandan K. Reddy1 1 Department of Computer Science, Virginia Tech 2 Department of Mechanical Engineering, Carnegie Mellon University 3 Machine Learning Department, Carnegie Mellon University
Pseudocode	No	The paper describes the steps of TPSR but does not present them in a structured pseudocode or algorithm block format.
Open Source Code	Yes	1The codes are available at: https://github.com/deep-symbolic-mathematics/TPSR
Open Datasets	Yes	We evaluate TPSR and various baseline methods on standard SR benchmark datasets from Penn Machine Learning Benchmark (PMLB) [43] studied in SRBench [42], as well as In-domain Synthetic Data generated based on [38, 18]. The benchmark datasets include 119 equations from Feynman Lectures on Physics database series2 [44], 14 symbolic regression problems from the ODE-Strogatz database3 [45], and 57 Black-box4 regression problems without known underlying equations.
Dataset Splits	No	The paper mentions '400 validation functions' for the In-domain Synthetic Data and uses SRBench datasets, but does not provide specific train/validation/test split percentages, sample counts, or detailed splitting methodology (e.g., random seed, stratified splitting) needed for reproducibility. It implies a 'test set' for evaluation and 'training points' for some analysis, but lacks explicit, comprehensive split details for all datasets.
Hardware Specification	No	The paper does not provide specific details on the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the implementation or experimentation.
Experiment Setup	Yes	For the E2E baseline, we use the settings reported in [18], including beam/sample size of C = 10 candidates, and the refinement of all the candidates K = 10. For our model, we use the width of tree search as kmax = 3, number of rollouts r = 3, and simulation beam size b = 1 as the default setting.