Deep Generative Symbolic Regression

Authors: Samuel Holt, Zhaozhi Qian, Mihaela van der Schaar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we show that DGSR achieves a higher recovery rate of true equations in the setting of a larger number of input variables, and it is more computationally efficient at inference time than state-of-the-art RL symbolic regression solutions. 5 EXPERIMENTS AND EVALUATION
Researcher Affiliation Academia Samuel Holt University of Cambridge sih31@cam.ac.uk Zhaozhi Qian University of Cambridge zq224@maths.cam.ac.uk Mihaela van der Schaar University of Cambridge The Alan Turing Institute mv472@cam.ac.uk
Pseudocode Yes Furthermore, we provide pseudocode for DGSR in Appendix D and show empirically other optimization algorithms can be used with an ablation of these in Section 5.2 and Appendix E.
Open Source Code Yes Additionally, the code is available at https://github.com/samholt/Deep Generative Symbolic Regression and have a broader research group codebase at https://github.com/vanderschaarlab/Deep Generative Symbolic Regression
Open Datasets Yes We evaluate DGSR on a set of common equations in natural sciences from the standard SR benchmark problem sets and on a problem set with a large number of input variables (d = 12).... We use equations from the Feynman SR database (Udrescu & Tegmark, 2020)... We also benchmark on SRBench (La Cava et al., 2021)...
Dataset Splits Yes Additionally we construct a validation set of 100 equations using the same pre-training setup, with a different random seed and check and remove any of the validation equations from the pre-training set.
Hardware Specification Yes This work was performed using a Intel Core i9-12900K CPU @ 3.20GHz, 64GB RAM with a Nvidia RTX3090 GPU 24GB.
Software Dependencies No The paper mentions software like PyTorch, Adam optimizer, DEAP, and Sympy, but does not provide specific version numbers for these software components.
Experiment Setup Yes During pre-training we use the vanilla policy gradient (VPG) loss function to train the conditional generator parameters θ. This is detailed in Appendices D, C, and we use the hyperparameters: batch size of k = 500 equations to sample, mini-batch of t = 5 datasets, EWMA coefficient α = 0.5, entropy weight λH = 0.003, minimum equation length = 4, maximum equation length = 30, Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001 and an early stopping patience of a 100 iterations (of a mini-batch). The hyperparameters for inference time are: batch size of k = 500 equations to sample, entropy weight λH = 0.003, minimum equation length = 4, maximum equation length = 30, PQT queue size = 10, sample selection size = 1, GP generations per iteration = 25, GP cross over probability = 0.5, GP mutation probability = 0.5, GP tournament size = 5, GP mutate tree maximum = 3 and Adam optimizer (Kingma & Ba, 2014) with a learning rate of 0.001. We also used ϵ = 0.02 for the risk seeking quantile parameter.