Deep symbolic regression for recurrence prediction
Authors: Stéphane D’Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, Francois Charton
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our integer model on a subset of OEIS sequences, and show that it outperforms built-in Mathematica functions for recurrence prediction. We also demonstrate that our float model is able to yield informative approximations of out-of-vocabulary functions and constants, e.g. bessel0(x) sin(x)+cos(x) πx and 1.644934 π2/6. |
| Researcher Affiliation | Collaboration | 1Department of Physics, Ecole Normale Sup erieure, Paris 2Meta AI, Paris 3Laboratoire d Informatique de Paris 6, Sorbonne Universit e, Paris. |
| Pseudocode | No | The paper describes methods and processes in text but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | The Online Encyclopedia of Integer Sequences (OEIS) is an online database containing over 300,000 integer sequences. It is tempting to directly use OEIS as a testbed for prediction; however, many sequences in OEIS do not have a closedform recurrence relation, such as the stops on the New York City Broadway line subway (A000053). (Sloane, 2007) |
| Dataset Splits | Yes | After each epoch, we evaluate the in-distribution performance of our models on a held-out dataset of 10,000 equations. |
| Hardware Specification | Yes | On 16 GPU with Volta architecture and 32GB memory, one epoch is processed in about an hour. |
| Software Dependencies | No | The paper describes the model architecture and general tools used (e.g., Transformer), but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | Similarly to Lample & Charton (2019), we use a simple Transformer architecture (Vaswani et al., 2017) with 8 hidden layers, 8 attention heads and an embedding dimension of 512 both for the encoder and decoder. Training and evaluation The tokens generated by the model are supervised via a cross-entropy loss. We use the Adam optimizer, warming up the learning rate from 10 7 to 2.10 4 over the first 10,000 steps, then decaying it as the inverse square root of the number of steps, following (Vaswani et al., 2017). We train each model for a minimum of 250 epochs, each epoch containing 5M equations in batches of 512. We provide the values of the parameters of the generator in Table 5. |