Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks

Authors: Felix Dangel, Johannes Müller, Marius Zeinhofer

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that our KFAC-based optimizers are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS ( 4). 4 Experiments We implement KFAC, KFAC*, and ENGD with either the per-layer or full Gramian in Py Torch [46].
Researcher Affiliation Collaboration Felix Dangel Vector Institute Toronto Canada fdangel@vectorinstitute.ai Johannes Müller Chair of Mathematics of Information Processing RWTH Aachen University Aachen, Germany mueller@mathc.rwth-aachen.de Marius Zeinhofer Seminar for Applied Mathematics, ETH Zürich, Department of Nuclear Medicine, University Hospital Freiburg marius.zeinhofer@uniklinik-freiburg.de
Pseudocode Yes B Pseudo-Code: KFAC for the Poisson Equation Algorithm 1 KFAC for the Poisson equation.
Open Source Code Yes We will open-source our KFAC implementations, as well as the code to fully reproduce all experiments and the original data presented in this manuscript.
Open Datasets Yes Pedagogical example: 2d Poisson equation We start with a low-dimensional Poisson equation from Müller & Zeinhofer [41] to reproduce ENGD s performance (Figure 1). It is given by u(x, y) = 2π2 sin(πx) sin(πy) for (x, y) [0, 1]2 u(x, y) = 0 for (x, y) [0, 1]2. (16) We choose a fixed data set of same size as the original paper, then use random/grid search to evaluate the performance of all optimizers for different tanh-activated MLPs...
Dataset Splits Yes We report runs with lowest L2 error estimated on a held-out data set with the known solution to the studied PDE.
Hardware Specification Yes All runs are executed on a compute cluster with RTX 6000 GPUs (24 Gi B RAM) in double precision. All runs are executed on RTX 6000 GPUs with 24 Gi B of RAM.
Software Dependencies No We implement KFAC, KFAC*, and ENGD with either the per-layer or full Gramian in Py Torch [46]. We tune hyper-parameters using Weights & Biases [60]
Experiment Setup Yes A.1 Hyper-Parameter Tuning Protocol In all our experiments, we tune the following optimizer hyper-parameters and otherwise use the Py Torch default values: SGD: learning rate, momentum Adam: learning rate Hessian-free: type of curvature matrix (Hessian or GGN), damping, whether to adapt damping over time (yes or no), maximum number of CG iterations LBFGS: learning rate, history size ENGD: damping, factor of the exponential moving average applied to the Gramian, initialization of the Gramian (zero or identity matrix) KFAC: factor of the exponential moving average applied to the Kronecker factors, damping, momentum, initialization of the Kronecker factors (zero or identity matrix) KFAC*: factor of the exponential moving average applied to the Kronecker factors, damping, initialization of the Kronecker factors (zero or identity matrix)