Multiple Physics Pretraining for Spatiotemporal Surrogate Models
Authors: Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Parker, Ruben Ohana, Miles Cranmer, Alberto Bietti, Michael Eickenberg, Siavash Golkar, Geraud Krawezik, Francois Lanusse, Mariel Pettee, Tiberiu Tesileanu, Kyunghyun Cho, Shirley Ho
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the efficacy of our approach on both pretraining and downstream tasks over a broad fluid mechanics-oriented benchmark. We show that a single MPP-pretrained transformer is able to match or outperform task-specific baselines on all pretraining sub-tasks without the need for finetuning. |
| Researcher Affiliation | Collaboration | 1 Flatiron Institute, 2 University of Colorado Boulder, 3 University of Cambridge, 4 Universit e Paris-Saclay, Universit e Paris Cit e, CEA, CNRS, AIM, 5 Physics Division, Lawrence Berkeley National Laboratory, 6 New York University, 7 Prescient Design, Genentech, 8 CIFAR Fellow, 9 Princeton University |
| Pseudocode | No | The paper includes architectural diagrams and descriptions of its methods but does not contain a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We open-source our code and model weights trained at multiple scales for reproducibility. |
| Open Datasets | Yes | Data. We use the full collection of two-dimensional time-dependent simulations from PDEBench (Takamoto et al., 2022) as our primary source for diverse pretraining data. ... To train and evaluate our models, we use the publicly available PDEBench dataset (Takamoto et al., 2022). We summarize the data included in this section. |
| Dataset Splits | Yes | Train/Val/Test: .8/.1/.1 split per dataset on the trajectory level. |
| Hardware Specification | Yes | Hardware. All training for both pretraining and finetuning is done using Distributed Data Parallel (DDP) across 8 Nvidia H100-80GB GPUs. |
| Software Dependencies | Yes | Software. All model development and training in this paper is performed using Py Torch 2.0 (Paszke et al., 2019). |
| Experiment Setup | Yes | For MPP, we train using the following settings: Training Duration: 200K steps Micro-batch size: 8 Accumulation Steps: 5 Optimizer: Adan (Xie et al., 2023) Weight Decay: 1E-3 Drop Path: 0.1 Base LR: DAdaptation (Defazio & Mishchenko, 2023) LR Schedule: Cosine decay Gradient clipping: 1.0 |