Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks
Authors: Felix Dangel, Johannes Müller, Marius Zeinhofer
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we find that our KFAC-based optimizers are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS ( 4). 4 Experiments We implement KFAC, KFAC*, and ENGD with either the per-layer or full Gramian in Py Torch [46]. |
| Researcher Affiliation | Collaboration | Felix Dangel Vector Institute Toronto Canada EMAIL Johannes Müller Chair of Mathematics of Information Processing RWTH Aachen University Aachen, Germany EMAIL Marius Zeinhofer Seminar for Applied Mathematics, ETH Zürich, Department of Nuclear Medicine, University Hospital Freiburg EMAIL |
| Pseudocode | Yes | B Pseudo-Code: KFAC for the Poisson Equation Algorithm 1 KFAC for the Poisson equation. |
| Open Source Code | Yes | We will open-source our KFAC implementations, as well as the code to fully reproduce all experiments and the original data presented in this manuscript. |
| Open Datasets | Yes | Pedagogical example: 2d Poisson equation We start with a low-dimensional Poisson equation from Müller & Zeinhofer [41] to reproduce ENGD s performance (Figure 1). It is given by u(x, y) = 2π2 sin(πx) sin(πy) for (x, y) [0, 1]2 u(x, y) = 0 for (x, y) [0, 1]2. (16) We choose a fixed data set of same size as the original paper, then use random/grid search to evaluate the performance of all optimizers for different tanh-activated MLPs... |
| Dataset Splits | Yes | We report runs with lowest L2 error estimated on a held-out data set with the known solution to the studied PDE. |
| Hardware Specification | Yes | All runs are executed on a compute cluster with RTX 6000 GPUs (24 Gi B RAM) in double precision. All runs are executed on RTX 6000 GPUs with 24 Gi B of RAM. |
| Software Dependencies | No | We implement KFAC, KFAC*, and ENGD with either the per-layer or full Gramian in Py Torch [46]. We tune hyper-parameters using Weights & Biases [60] |
| Experiment Setup | Yes | A.1 Hyper-Parameter Tuning Protocol In all our experiments, we tune the following optimizer hyper-parameters and otherwise use the Py Torch default values: SGD: learning rate, momentum Adam: learning rate Hessian-free: type of curvature matrix (Hessian or GGN), damping, whether to adapt damping over time (yes or no), maximum number of CG iterations LBFGS: learning rate, history size ENGD: damping, factor of the exponential moving average applied to the Gramian, initialization of the Gramian (zero or identity matrix) KFAC: factor of the exponential moving average applied to the Kronecker factors, damping, momentum, initialization of the Kronecker factors (zero or identity matrix) KFAC*: factor of the exponential moving average applied to the Kronecker factors, damping, initialization of the Kronecker factors (zero or identity matrix) |