End-to-end Symbolic Regression with Transformers

Authors: Pierre-alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, Francois Charton

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model on problems from the SRBench benchmark and show that our model approaches the performance of state-of-the-art genetic programming with several orders of magnitude faster inference. In this section, we present the results of our model. We begin by studying in-domain accuracy, then present results on out-of-domain datasets.
Researcher Affiliation Collaboration 1Meta AI 2ISIR MLIA, Sorbonne Université 3Department of Physics, Ecole Normale Supérieure
Pseudocode No The paper describes procedures in narrative text but does not include any structured pseudocode or algorithm blocks with explicit labels.
Open Source Code Yes 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets Yes We evaluate our method on the recently released benchmark SRBench[7]. Its repository contains a set of 252 regression datasets from the Penn Machine Learning Benchmark (PMLB)[35] in addition to 14 open-source SR and ML baselines.
Dataset Splits Yes We hold out a validation set of 10^4 examples from the same generator, and train our models until the accuracy on the validation set saturates (around 50 epochs of 3M examples).
Hardware Specification Yes On 32 GPU with 32GB memory each, one epoch is processed in about half an hour.
Software Dependencies No The paper mentions software libraries like sympytorch and functorch, but it does not specify their version numbers or the versions of other key software components used in the experiments.
Experiment Setup Yes We optimize a cross-entropy loss with the Adam optimizer, warming up the learning rate from 10 7 to 2.10 4 over the first 10,000 steps, then decaying it as the inverse square root of the number of steps, following [23]. We use a sequence to sequence Transformer architecture [23] with 16 attention heads and an embedding dimension of 512, containing a total of 86M parameters... we use 4 layers in the encoder and 16 in the decoder.