Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
End-to-end Symbolic Regression with Transformers
Authors: Pierre-alexandre Kamienny, Stéphane d'Ascoli, Guillaume Lample, Francois Charton
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model on problems from the SRBench benchmark and show that our model approaches the performance of state-of-the-art genetic programming with several orders of magnitude faster inference. In this section, we present the results of our model. We begin by studying in-domain accuracy, then present results on out-of-domain datasets. |
| Researcher Affiliation | Collaboration | 1Meta AI 2ISIR MLIA, Sorbonne Université 3Department of Physics, Ecole Normale Supérieure |
| Pseudocode | No | The paper describes procedures in narrative text but does not include any structured pseudocode or algorithm blocks with explicit labels. |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We evaluate our method on the recently released benchmark SRBench[7]. Its repository contains a set of 252 regression datasets from the Penn Machine Learning Benchmark (PMLB)[35] in addition to 14 open-source SR and ML baselines. |
| Dataset Splits | Yes | We hold out a validation set of 10^4 examples from the same generator, and train our models until the accuracy on the validation set saturates (around 50 epochs of 3M examples). |
| Hardware Specification | Yes | On 32 GPU with 32GB memory each, one epoch is processed in about half an hour. |
| Software Dependencies | No | The paper mentions software libraries like sympytorch and functorch, but it does not specify their version numbers or the versions of other key software components used in the experiments. |
| Experiment Setup | Yes | We optimize a cross-entropy loss with the Adam optimizer, warming up the learning rate from 10 7 to 2.10 4 over the first 10,000 steps, then decaying it as the inverse square root of the number of steps, following [23]. We use a sequence to sequence Transformer architecture [23] with 16 attention heads and an embedding dimension of 512, containing a total of 86M parameters... we use 4 layers in the encoder and 16 in the decoder. |