Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks

Authors: Felix Dangel, Johannes Müller, Marius Zeinhofer

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we find that our KFAC-based optimizers are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS ( 4). 4 Experiments We implement KFAC, KFAC*, and ENGD with either the per-layer or full Gramian in Py Torch [46].
Researcher Affiliation Collaboration Felix Dangel Vector Institute Toronto Canada EMAIL Johannes Müller Chair of Mathematics of Information Processing RWTH Aachen University Aachen, Germany EMAIL Marius Zeinhofer Seminar for Applied Mathematics, ETH Zürich, Department of Nuclear Medicine, University Hospital Freiburg EMAIL
Pseudocode Yes B Pseudo-Code: KFAC for the Poisson Equation Algorithm 1 KFAC for the Poisson equation.
Open Source Code Yes We will open-source our KFAC implementations, as well as the code to fully reproduce all experiments and the original data presented in this manuscript.
Open Datasets Yes Pedagogical example: 2d Poisson equation We start with a low-dimensional Poisson equation from Müller & Zeinhofer [41] to reproduce ENGD s performance (Figure 1). It is given by u(x, y) = 2π2 sin(πx) sin(πy) for (x, y) [0, 1]2 u(x, y) = 0 for (x, y) [0, 1]2. (16) We choose a fixed data set of same size as the original paper, then use random/grid search to evaluate the performance of all optimizers for different tanh-activated MLPs...
Dataset Splits Yes We report runs with lowest L2 error estimated on a held-out data set with the known solution to the studied PDE.
Hardware Specification Yes All runs are executed on a compute cluster with RTX 6000 GPUs (24 Gi B RAM) in double precision. All runs are executed on RTX 6000 GPUs with 24 Gi B of RAM.
Software Dependencies No We implement KFAC, KFAC*, and ENGD with either the per-layer or full Gramian in Py Torch [46]. We tune hyper-parameters using Weights & Biases [60]
Experiment Setup Yes A.1 Hyper-Parameter Tuning Protocol In all our experiments, we tune the following optimizer hyper-parameters and otherwise use the Py Torch default values: SGD: learning rate, momentum Adam: learning rate Hessian-free: type of curvature matrix (Hessian or GGN), damping, whether to adapt damping over time (yes or no), maximum number of CG iterations LBFGS: learning rate, history size ENGD: damping, factor of the exponential moving average applied to the Gramian, initialization of the Gramian (zero or identity matrix) KFAC: factor of the exponential moving average applied to the Kronecker factors, damping, momentum, initialization of the Kronecker factors (zero or identity matrix) KFAC*: factor of the exponential moving average applied to the Kronecker factors, damping, initialization of the Kronecker factors (zero or identity matrix)