PDE-Refiner: Achieving Accurate Long Rollouts with Neural PDE Solvers
Authors: Phillip Lippe, Bas Veeling, Paris Perdikaris, Richard Turner, Johannes Brandstetter
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate PDE-Refiner on challenging benchmarks of complex fluid dynamics, demonstrating stable and accurate rollouts that consistently outperform state-of-the-art models, including neural, numerical, and hybrid neural-numerical architectures. We further demonstrate that PDE-Refiner greatly enhances data efficiency, since the denoising objective implicitly induces a novel form of spectral data augmentation. Finally, PDE-Refiner s connection to diffusion models enables an accurate and efficient assessment of the model s predictive uncertainty, allowing us to estimate when the surrogate becomes inaccurate. |
| Researcher Affiliation | Collaboration | Phillip Lippe Microsoft Research AI4Science phillip.lippe@googlemail.com Bastiaan S. Veeling Microsoft Research AI4Science Paris Perdikaris Microsoft Research AI4Science Richard E. Turner Microsoft Research AI4Science Johannes Brandstetter Microsoft Research AI4Science brandstetter@ml.jku.at Work done during internship at Microsoft Research; on leave from University of Amsterdam. |
| Pseudocode | Yes | C PDE-Refiner Pseudocode In this section, we provide pseudocode to implement PDE-Refiner in Python with common deep learning frameworks like Py Torch [62] and JAX [6]. |
| Open Source Code | Yes | We make our code publicly available at https://github.com/microsoft/pdearena. |
| Open Datasets | Yes | We follow the data generation setup of Brandstetter et al. [8] by using a mesh of length L discretized uniformly for 256 points with periodic boundaries. For each trajectory, we randomly sample the length L between [0.9 64, 1.1 64] and the time step t U(0.18, 0.22). The initial conditions are sampled from a distribution over truncated Fourier series with random coefficients {Am, ℓm, ϕm}m as u0(x) = P10 m=1 Am sin(2πℓmx/L + ϕm). We generate a training dataset with 2048 trajectories of rollout length 140 t, and test on 128 trajectories with a duration of 640 t. Our dataset can be reproduced with the public code4 of Brandstetter et al. [8]. Following previous work [43, 79], we set the forcing to f = sin(4y)ˆx 0.1u, the density ρ = 1, and viscosity ν = 0.001... To align our experiments with previous results, we use the same dataset of 128 trajectories for training and 16 trajectories for testing as Sun et al. [79]. |
| Dataset Splits | Yes | We generate a training dataset with 2048 trajectories of rollout length 140 ∆t, and test on 128 trajectories with a duration of 640 ∆t. For hyperparameter tuning, we additionally generate a validation set of the same size as the test data with initial seed 123. |
| Hardware Specification | Yes | In terms of computational resources, all experiments have been performed on NVIDIA V100 GPUs with 16GB memory. The speed comparison for the 2D Kolmogorov Flow were performed on an NVIDIA A100 GPU with 80GB memory. |
| Software Dependencies | Yes | As existing software assets, we base our implementation on the PDE-Arena [22], which implements a Python-based training framework for neural PDE solvers in Py Torch [62] and Py Torch Lightning [14]. For the diffusion models, we use the library diffusers [65]. We use Matplotlib [34] for plotting and Num Py [89] for data handling. For data generation, we use scipy [85] in the public code of Brandstetter et al. [8] for the KS equation, and JAX [6] in the public code of Kochkov et al. [43], Sun et al. [79] for the 2D Kolmogorov Flow dataset. We implement the Diffusion model using the diffusers library [65] (version 0.15) in the pseudocode below. |
| Experiment Setup | Yes | We detail the used hyperparameters for all models in Table 3. We train the models for 400 epochs on a batch size of 128 with an Adam W optimizer [52]. One epoch corresponds to iterating through all training sequences and picking 100 random initial conditions each. The learning rate is initialized with 1e-4, and follows a cosine annealing strategy to end with a final learning rate of 1e-6. We did not find learning rate warmup to be needed for our models. For regularization, we use a weight decay of 1e-5. As mentioned in Section 4.1, we train the neural operators to predict 4 time steps ahead via predicting the residual u = u(t) - u(t - 4∆t). For better output coverage of the neural network, we normalize the residual to a standard deviation of about 1 by dividing it with 0.3. |