Predicting Ordinary Differential Equations with Transformers
Authors: Sören Becker, Michal Klein, Alexander Neitz, Giambattista Parascandolo, Niki Kilbertus
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate in extensive empirical evaluations that our model performs better or on par with existing methods in terms of accurate recovery across various settings. |
| Researcher Affiliation | Collaboration | 1Helmholtz AI, Helmholtz Center Munich, Munich, Germany. 2Apple, Paris, France. 3Deep Mind, London, United Kingdom. 4Open AI, San Francisco, United States. 5Technical University of Munich, Germany. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and data are publicly available at https://github.com/soerenab/ nsode23. |
| Open Datasets | Yes | Code and data are publicly available at https://github.com/soerenab/ nsode23. |
| Dataset Splits | No | The paper mentions evaluating the best model based on 'validation loss' in Appendix A.3 but does not provide specific details on the split percentages or sample counts for its own model's validation set from the overall dataset. It only states for baselines that a trajectory is split into training and validation intervals. |
| Hardware Specification | Yes | The model is trained on an internal academic compute cluster using 4 Nvidia A100 GPUs for 25 epochs... |
| Software Dependencies | No | The paper lists several software packages in Table 9 (e.g., Python, Py Torch, Numpy, Sci Py, Sym Py, Hugging Face), but it only provides publication years or general URLs, not specific version numbers required for reproducibility (e.g., 'Py Torch (Paszke et al., 2019)' instead of 'PyTorch 1.9'). |
| Experiment Setup | Yes | We choose a batchsize of 600 and use a linear learning rate warm-up over 10,000 optimization step after which we keep the learning rate constant at 10 4. For the fixed tokens that are used to decode constants, we choose an equidistant grid 10 = x1 < x2 < . . . < xm = 10 with m = 21. |