ODEFormer: Symbolic Regression of Dynamical Systems with Transformers
Authors: Stéphane d'Ascoli, Sören Becker, Philippe Schwaller, Alexander Mathis, Niki Kilbertus
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive evaluations on two datasets: (i) the existing Strogatz dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully curated from the literature to provide a more holistic benchmark. ODEFormer consistently outperforms existing methods while displaying substantially improved robustness to noisy and irregularly sampled observations, as well as faster inference. |
| Researcher Affiliation | Academia | Stéphane d Ascoli EPFL stephane.dascoli@gmail.com Sören Becker Helmholtz Munich Munich Center for Machine Learning TU Munich soren.a.becker@gmail.com Alexander Mathis EPFL Philippe Schwaller EPFL Niki Kilbertus Helmholtz Munich Munich Center for Machine Learning TU Munich |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | We release our code, model and benchmark at https://github.com/sdascoli/odeformer. ...all code, model weights, and created benchmark datasets will be made publicly available at https://github.com/sdascoli/odeformer together with notebooks to directly reproduce key results, as well as a pip-installable package for easy usage. |
| Open Datasets | Yes | Faced with the lack of benchmarks... we also introduce ODEBench, a more extensive dataset of 63 ODEs curated from the literature... We publicly release ODEBench with descriptions, sources of all equations, and well integrated solution trajectories more details are in Appendix A. ...For this we first consider the Strogatz dataset, included in the Penn Machine Learning Benchmark (PMLB) database (La Cava et al., 2021). |
| Dataset Splits | Yes | For each combination of hyperparameters, the model is fitted on the first 70% and scored on the remaining 30% of a trajectory. |
| Hardware Specification | Yes | When run on a single NVIDIA A100 GPU with 80GB memory and 8 CPU cores, ODEFormer s training process takes roughly three days. |
| Software Dependencies | No | The paper mentions software like 'scipy', 'pysindy', and 'scikit-learn' but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The model comprises 16 attention heads and an embedding dimension of 512, leading to a total parameter count of 86M. As observed by Charton (2022), we find that optimal performance is achieved in an asymmetric architecture, using 4 layers in the encoder and 16 in the decoder. ...We optimize the cross-entropy loss... We use the Adam optimizer (with default parameters suggested by Kingma & Ba (2015)), with a learning rate warming up from 10 7 to 2 10 4 across the initial 10,000 steps and a subsequent decaying governed by a cosine schedule for the next 300,000 steps. The annealing cycle then restarts with a damping factor of 3/2... resulting in approximately 800,000 optimization steps. We do not use any regularization such as weight decay or dropout. To efficiently manage the greatly varying input sequence lengths, we group examples of similar lengths in batches, with the constraint that each batch contains 10,000 tokens. |