Deep symbolic regression for recurrence prediction

Authors: Stéphane D’Ascoli, Pierre-Alexandre Kamienny, Guillaume Lample, Francois Charton

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our integer model on a subset of OEIS sequences, and show that it outperforms built-in Mathematica functions for recurrence prediction. We also demonstrate that our float model is able to yield informative approximations of out-of-vocabulary functions and constants, e.g. bessel0(x) sin(x)+cos(x) πx and 1.644934 π2/6.
Researcher Affiliation Collaboration 1Department of Physics, Ecole Normale Sup erieure, Paris 2Meta AI, Paris 3Laboratoire d Informatique de Paris 6, Sorbonne Universit e, Paris.
Pseudocode No The paper describes methods and processes in text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes The Online Encyclopedia of Integer Sequences (OEIS) is an online database containing over 300,000 integer sequences. It is tempting to directly use OEIS as a testbed for prediction; however, many sequences in OEIS do not have a closedform recurrence relation, such as the stops on the New York City Broadway line subway (A000053). (Sloane, 2007)
Dataset Splits Yes After each epoch, we evaluate the in-distribution performance of our models on a held-out dataset of 10,000 equations.
Hardware Specification Yes On 16 GPU with Volta architecture and 32GB memory, one epoch is processed in about an hour.
Software Dependencies No The paper describes the model architecture and general tools used (e.g., Transformer), but does not provide specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes Similarly to Lample & Charton (2019), we use a simple Transformer architecture (Vaswani et al., 2017) with 8 hidden layers, 8 attention heads and an embedding dimension of 512 both for the encoder and decoder. Training and evaluation The tokens generated by the model are supervised via a cross-entropy loss. We use the Adam optimizer, warming up the learning rate from 10 7 to 2.10 4 over the first 10,000 steps, then decaying it as the inverse square root of the number of steps, following (Vaswani et al., 2017). We train each model for a minimum of 250 epochs, each epoch containing 5M equations in batches of 512. We provide the values of the parameters of the generator in Table 5.