An operator preconditioning perspective on training in physics-informed machine learning

Authors: Tim De Ryck, Florent Bonnet, Siddhartha Mishra, Emmanuel de Bezenac

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We employ both rigorous mathematical analysis and empirical evaluations to investigate various strategies, explaining how they better condition this critical operator, and consequently improve training.
Researcher Affiliation Academia Tim De Ryck Seminar for Applied Mathematics, ETH Z urich, Switzerland; Florent Bonnet Institute of Intelligent Systems and Robotics, Extrality, Sorbonne Universit e, France; Siddhartha Mishra Seminar for Applied Mathematics, ETH AI Center, ETH Z urich, Switzerland; Emmanuel de B ezenac Seminar for Applied Mathematics, ETH Z urich, Switzerland
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about open-sourcing its code or a link to a code repository.
Open Datasets No The paper solves PDEs on a discretized domain rather than using a named public dataset. For example, 'We discretize the domain Ωon a grid of size 256 100 for the learning process...'
Dataset Splits No The paper discusses discretizing the domain for training and computing the matrix A, but it does not specify explicit train/validation/test splits of a dataset.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies Yes Throughout all the experiments, we use torch version 2.0.1 and incorporate functional routines from torch.func. ... For the linear models we use a routine to compute its condition number and find the optimal λ before training using the black-box gradient free 1d optimisation method from scipy.optimize.golden.
Experiment Setup Yes Once the matrix A is computed, for the linear models we use a routine to compute its condition number and find the optimal λ before training using the black-box gradient free 1d optimisation method from scipy.optimize.golden. The learning rate is then chosen as 1/λmax. For models with MLPs this is no longer possible because of the zero eigenvalues that are plaguing the matrix; we resort to grid search sweeping wide ranges of learning rates and λ values. ... For the linear model, we use classical batch gradient descent on the full grid with and without preconditioning for 200 epochs. ... For the MLP, we use Adam on the full grid without preconditioning for 10000 epochs. The learning rate is set to 0.0001 and λ is set to 1 via grid search...