Training Deep Surrogate Models with Large Scale Online Learning

Authors: Lucas Thibaut Meyer, Marc Schouler, Robert Alexander Caulk, Alejandro Ribes, Bruno Raffin

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments compare the offline and online training of four surrogate models, including state-of-the-art architectures. Results indicate that exposing deep surrogate models to more dataset diversity, up to hundreds of GB, can increase model generalization capabilities. Fully connected neural networks, Fourier Neural Operator (FNO), and Message Passing PDE Solver prediction accuracy is improved by 68%, 16% and 7%, respectively.
Researcher Affiliation Collaboration 1Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LIG 2Industrial AI Laboratory SINCLAIR, EDF Lab Paris-Saclay.
Pseudocode No The paper includes architectural diagrams and descriptions of data flow, but it does not provide any pseudocode or algorithm blocks.
Open Source Code Yes Code and documentation are respectively available at https: //gitlab.inria.fr/melissa/melissa and https: //melissa.gitlabpages.inria.fr/melissa/
Open Datasets Yes The Message Passing Neural PDE Solver is trained on the mixed advection-diffusion dataset... The implementation of the example is directly taken from the PDEBench paper (Takamoto et al., 2022) D.1... The implementation of the example is directly taken from the original paper of the Message Passing PDE Solver (Brandstetter et al., 2022) experiment E3 in 4.1.
Dataset Splits Yes The Message Passing Neural PDE Solver is trained on the mixed advection-diffusion dataset consisting of a training set and a test set of respectively 2,048 and 128 trajectories as in Brandstetter et al. (2022)... The same validation dataset is used to evaluate the performance of both training sessions. It consists of 200 trajectories generated offline... The validation dataset consists of 10 trajectories.
Hardware Specification Yes Experiments are run on NVIDIA V100 GPUs and Intel Xeon 2.5GHz processors.
Software Dependencies No The paper mentions software like 'Python', 'Pytorch', 'Tensorflow models', 'Zero MQ', 'Slurm', and 'OAR', but it does not specify any version numbers for these software components.
Experiment Setup Yes The model architecture consists of 3 layers of 1024 features, followed by Re LU activation function except for the output layer... Both online and offline training procedures follow the same learning rate schedule starting with a value of 1E 3 and decaying exponentially. For offline training, the number of epochs is adjusted to always train with 100,000 batches... For all training strategies, the batch size is 1024.