reproducibilityindex.ai

ODEFormer: Symbolic Regression of Dynamical Systems with Transformers

Authors: Stéphane d'Ascoli, Sören Becker, Philippe Schwaller, Alexander Mathis, Niki Kilbertus

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform extensive evaluations on two datasets: (i) the existing Strogatz dataset featuring two-dimensional systems; (ii) ODEBench, a collection of one- to four-dimensional systems that we carefully curated from the literature to provide a more holistic benchmark. ODEFormer consistently outperforms existing methods while displaying substantially improved robustness to noisy and irregularly sampled observations, as well as faster inference.
Researcher Affiliation	Academia	Stéphane d Ascoli EPFL stephane.dascoli@gmail.com Sören Becker Helmholtz Munich Munich Center for Machine Learning TU Munich soren.a.becker@gmail.com Alexander Mathis EPFL Philippe Schwaller EPFL Niki Kilbertus Helmholtz Munich Munich Center for Machine Learning TU Munich
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	Yes	We release our code, model and benchmark at https://github.com/sdascoli/odeformer. ...all code, model weights, and created benchmark datasets will be made publicly available at https://github.com/sdascoli/odeformer together with notebooks to directly reproduce key results, as well as a pip-installable package for easy usage.
Open Datasets	Yes	Faced with the lack of benchmarks... we also introduce ODEBench, a more extensive dataset of 63 ODEs curated from the literature... We publicly release ODEBench with descriptions, sources of all equations, and well integrated solution trajectories more details are in Appendix A. ...For this we first consider the Strogatz dataset, included in the Penn Machine Learning Benchmark (PMLB) database (La Cava et al., 2021).
Dataset Splits	Yes	For each combination of hyperparameters, the model is fitted on the first 70% and scored on the remaining 30% of a trajectory.
Hardware Specification	Yes	When run on a single NVIDIA A100 GPU with 80GB memory and 8 CPU cores, ODEFormer s training process takes roughly three days.
Software Dependencies	No	The paper mentions software like 'scipy', 'pysindy', and 'scikit-learn' but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	The model comprises 16 attention heads and an embedding dimension of 512, leading to a total parameter count of 86M. As observed by Charton (2022), we find that optimal performance is achieved in an asymmetric architecture, using 4 layers in the encoder and 16 in the decoder. ...We optimize the cross-entropy loss... We use the Adam optimizer (with default parameters suggested by Kingma & Ba (2015)), with a learning rate warming up from 10 7 to 2 10 4 across the initial 10,000 steps and a subsequent decaying governed by a cosine schedule for the next 300,000 steps. The annealing cycle then restarts with a damping factor of 3/2... resulting in approximately 800,000 optimization steps. We do not use any regularization such as weight decay or dropout. To efficiently manage the greatly varying input sequence lengths, we group examples of similar lengths in batches, with the constraint that each batch contains 10,000 tokens.