Learning Neural PDE Solvers with Parameter-Guided Channel Attention

Authors: Makoto Takamoto, Francesco Alesiani, Mathias Niepert

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare CAPE in conjunction with the curriculum learning strategy using a popular PDE benchmark and obtain consistent and significant improvements over the baseline models. The experiments also show several advantages of CAPE, such as its increased ability to generalize to unseen PDE parameters without large increases inference time and parameter count.
Researcher Affiliation Collaboration 1NEC Laboratories Europe, Heidelberg, Germany 2University of Stuttgart, Stuttgart, Germany. Correspondence to: Makoto Takamoto <makoto.takamoto@neclab.eu>.
Pseudocode Yes Algorithm 1 Algorithm of the curriculum training strategy
Open Source Code Yes An implementation of the method and experiments are available at https://github. com/nec-research/CAPE-ML4Sci.
Open Datasets Yes We used datasets provided by PDEBench (Pradita et al., 2022) a benchmark for Sci ML from which we selected the following PDEs
Dataset Splits Yes For 1-dimensional PDEs, we used N = 9000 training instances and 1000 test instances for each PDE parameter with resolution 128. For 2-dimensional NS equations, we used N = 900 training instances and 100 test instances for each PDE parameter with spatial resolution 64 64.
Hardware Specification Yes The training was performed on Ge Force RTX 2080 GPU for 1D PDEs and Ge Force GTX 3090 for 2D NS equations. The experiments were run using an Nvidia Ge Force RTX 3090 with CUDA-11.6.
Software Dependencies Yes The ML models are implemented using Py Torch 1.12.1 and the numerical simulations with JAX-0.3.17.
Experiment Setup Yes The optimization was performed with Adam (Kingma & Ba) for 100 epochs. The learning rate was set as 3 10 3 which is divided by 2.0 every 20 epochs. The mini-batch size we used was 50 for all the cases. To stabilize the CAPE module s training in the initial phase, we empirically found it is a little better if we have a warm-up phase during which only CAPE module is updated. We performed warm-up for the first 3 epochs, which slightly reduce the final performance fluctuations resulting from the randomness of the initial weights of the network. In the CAPE module, the kernel size of the depth-wise convolution was set as: 5.