Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GSR: A Generalized Symbolic Regression Approach

Authors: Tony Tohme, Dehong Liu, KAMAL YOUCEF-TOUMI

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run a series of numerical experiments on the well-known SR benchmark datasets and show that our proposed method is competitive with many strong SR methods. Finally, we introduce Sym Set, a new SR benchmark problem set that is more challenging than existing benchmarks.
Researcher Affiliation	Collaboration	Tony Tohme EMAIL Massachusetts Institute of Technology Dehong Liu EMAIL Mitsubishi Electric Research Laboratories Kamal Youcef-Toumi EMAIL Massachusetts Institute of Technology
Pseudocode	Yes	Algorithm 1 outlines the overall process for solving the constrained Lasso optimization problem in Equation 6, for a given matrix A, regularization parameter λ, penalty parameter ρ, and initial guesses w0, z0, u0. A pseudocode of our proposed GSR algorithm is provided in Appendix A. Algorithm 2: GP Procedure for GSR
Open Source Code	No	The paper does not explicitly state that the authors are releasing their source code for the GSR methodology. It mentions third-party tools like PySR and gplearn but not their own code.
Open Datasets	Yes	We evaluate our proposed GSR method through a series of numerical experiments on a number of common SR benchmark datasets. In particular, we compare our approach to existing state-of-the-art methods using three popular SR benchmark problem sets: Nguyen (Uy et al., 2011), Jin (Jin et al., 2019), and Neat (Trujillo et al., 2016). In addition, we demonstrate the beneﬁts of our proposed method on the recently introduced SR benchmark dataset called Livermore (Mundhenk et al., 2021)
Dataset Splits	Yes	Table 19: Speciﬁcations of the Symbolic Regression (SR) benchmark problems. Input variables are denoted by x for 1-dimensional problems, and by (x1, x2) for 2-dimensional problems. U(a, b, c) indicates c random points uniformly sampled between a and b for every input variable; diﬀerent random seeds are used for the training and test sets. E(a, b, c) indicates c evenly spaced points between a and b for every input variable; the same points are used for the training and test sets except Neat-6, which uses E(1, 120, 120) as test set, and the Jin tests, which use U( 3, 3, 30) as test set . Table 20: Speciﬁcations of the Symbolic Regression (SR) benchmark problems. Input variables are denoted by x for 1-dimensional problems, by (x1, x2) for 2-dimensional problems, and by (x1, x2, x3) for 3dimensional problems. U(a, b, c) indicates c random points uniformly sampled between a and b for every input variable; diﬀerent random seeds are used for the training and test sets.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models.
Software Dependencies	No	The paper mentions "Python/Julia" and "Koza-style SR method in Python" in the context of other benchmark methods (PySR and gplearn), but does not specify the version numbers of any software dependencies used for their own GSR implementation.
Experiment Setup	Yes	Throughout our experiments, we adopt the following hyperparameter values. For GP, we use a population size Np = 30, and we allow for np = 10 surviving individuals per generation. We perform crossover with probability Pc = 1/4 and allow for only 2 parents to be involved in the process (i.e. new individuals are formed by combing basis functions from two randomly chosen parent individuals). We apply mutation with probability Pm = 1/4 and allow for 3 basis functions (randomly selected from an individual) to be mutated (i.e. to be discarded and replaced by completely new basis functions). We generate a (completely new) random individual with probability Pr = 1/2. For ADMM, we use a regularizer λ = 0.4, a penalty ρ = 0.1. The algorithm terminates when the ℓ2-norm of the diﬀerence between the weight vectors from two consecutive iterations falls below a threshold of δ = 10 5. Regarding initial conditions, we use w0 = b1/sqrt(Mφ+Mψ) (where b denotes a normalized vector), z0 = 1 = [1 1]T , u0 = 0 = [0 0]T . For GSR, we allow for a maximum of Mφ = 15 basis functions φ( ) for each expression of f( ) ... We allow for a maximum of Mψ = 1 basis function ψ( ) for each expression of g( ) ... GSR terminates when a candidate expression achieves a RMSE lower than a threshold with a starting value of ϵ = 10 6. All hyperparameter values are summarized in Table 9.