Poseidon: Efficient Foundation Models for PDEs
Authors: Maximilian Herde, Bogdan Raonic, Tobias Rohner, Roger Käppeli, Roberto Molinaro, Emmanuel de Bezenac, Siddhartha Mishra
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | POSEIDON is pretrained on a diverse, large scale dataset for the governing equations of fluid dynamics. It is then evaluated on a suite of 15 challenging downstream tasks that include a wide variety of PDE types and operators. We show that POSEIDON exhibits excellent performance across the board by outperforming baselines significantly, both in terms of sample efficiency and accuracy. |
| Researcher Affiliation | Academia | Maximilian Herde1, Bogdan Raoni c1,2, Tobias Rohner1 Roger Käppeli1 Roberto Molinaro1 Emmanuel de Bézenac1 Siddhartha Mishra1,2 1Seminar for Applied Mathematics, ETH Zurich, Switzerland 2ETH AI Center, Zurich, Switzerland |
| Pseudocode | No | The paper describes the model architecture and computational realizations through prose and mathematical equations but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Finally, the POSEIDON model as well as underlying pretraining and downstream datasets are open sourced, with code being available at https://github.com/camlab-ethz/poseidon and pretrained models and datasets at https://huggingface.co/camlab-ethz. |
| Open Datasets | Yes | All these datasets are publicly available with the PDEGYM collection (https://huggingface.co/collections/camlab-ethz/pdegym-665472c2b1181f7d10b40651). |
| Dataset Splits | Yes | We generated 20000 NS-Sines trajectories of which the first 19640 belong to the training set, the next 120 to the validation set, and the last 240 to the test set. |
| Hardware Specification | Yes | The model is pretrained on 8 NVIDIA RTX 4090 GPUs using the following (dataparallel) training protocol: ... All our pretrainings were performed in (data-)parallel on 8 NVIDIA Ge Force RTX 4090 GPUs. |
| Software Dependencies | No | The paper mentions using 'Adam W [41]' as an optimizer and states 'Everything is tightly integrated into Huggingface Transformers [73] and we make heavy use of Huggingface Accelerate for distributed training.' However, specific version numbers for these software components (e.g., PyTorch, Huggingface Transformers, or Accelerate) are not provided, which are crucial for full reproducibility. |
| Experiment Setup | Yes | Optimizer: Adam W [41] Scheduler: Cosine Decay with linear warmup of 2 epochs Maximum learning rate: 10 3 Weight decay: 0.1 Effective batch size: 640, resulting in a per-device batch size of 80 Number of epochs: 40 Early stopping: No Gradient clipping (maximal norm): 5 |