End-to-end Symbolic Regression with Transformers
Authors: Pierre-alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, Francois Charton
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on problems from the SRBench benchmark and show that our model approaches the performance of state-of-the-art genetic programming with several orders of magnitude faster inference. In this section, we present the results of our model. We begin by studying in-domain accuracy, then present results on out-of-domain datasets. |
| Researcher Affiliation | Collaboration | 1Meta AI 2ISIR MLIA, Sorbonne Université 3Department of Physics, Ecole Normale Supérieure |
| Pseudocode | No | The paper describes procedures in narrative text but does not include any structured pseudocode or algorithm blocks with explicit labels. |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We evaluate our method on the recently released benchmark SRBench[7]. Its repository contains a set of 252 regression datasets from the Penn Machine Learning Benchmark (PMLB)[35] in addition to 14 open-source SR and ML baselines. |
| Dataset Splits | Yes | We hold out a validation set of 10^4 examples from the same generator, and train our models until the accuracy on the validation set saturates (around 50 epochs of 3M examples). |
| Hardware Specification | Yes | On 32 GPU with 32GB memory each, one epoch is processed in about half an hour. |
| Software Dependencies | No | The paper mentions software libraries like sympytorch and functorch, but it does not specify their version numbers or the versions of other key software components used in the experiments. |
| Experiment Setup | Yes | We optimize a cross-entropy loss with the Adam optimizer, warming up the learning rate from 10 7 to 2.10 4 over the first 10,000 steps, then decaying it as the inverse square root of the number of steps, following [23]. We use a sequence to sequence Transformer architecture [23] with 16 attention heads and an embedding dimension of 512, containing a total of 86M parameters... we use 4 layers in the encoder and 16 in the decoder. |