Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Improving Energy Natural Gradient Descent through Woodbury, Momentum, and Randomization

Authors: Andrés Guzmán-Cordero, Felix Dangel, Gil Goldshlager, Marius Zeinhofer

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present thorough empirical evidence for our claims, demonstrating the utility of the Woodbury identity and the SPRING algorithm as well as the power and the limitations of randomization. We explore both lowand high-dimensional problems and we meticulously tune the hyperparameters of each method to attain optimal performance, with final results supported by more than 4500 training runs in total. Our numerical experiments demonstrate that our methods outperform previous approaches, achieving the same L2 error as the original ENGD up to 75 faster. See Section 4 for details.
Researcher Affiliation	Academia	Andr es Guzm an-Cordero Vector Institute, Mila Quebec AI Institute, Universit e de Montr eal EMAIL Felix Dangel Vector Institute EMAIL Gil Goldshlager UC Berkeley EMAIL Marius Zeinhofer ETH Zurich EMAIL
Pseudocode	Yes	Algorithm 1 SPRING for PINNs Algorithm 2 GPU-Efficient Randomized Nystr om Approximation
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We will open-source our implementations, as well as the code to fully reproduce all experiments and the original data presented in this work.
Open Datasets	No	To demonstrate our methods, we consider a Poisson equation u(x) = f(x) with different right-hand sides and boundary conditions on the unit square x [0, 1]d for d = 5, 100. For d = 5, we use as manufactured solution u(x) = P5 i=1 cos(πxi) and right-hand side f = π2u. For d = 100, we use u(x) = \|\|x\|\|2 2 for x [0, 1]100 and consequently f = 2d. Furthermore, we consider a 4+1d Heat equation... At last, we consider a 9+1d Fokker-Planck equation...
Dataset Splits	Yes	We sample training batches of size NΩ= 3000, N Ω= 500 and evaluate the L2 error on a separate set of 30 000 data points using the known solution u (x) = P5 i=1 cos(πxi). All optimizers sample a new training batch each iteration, and each run is limited to 7000s.
Hardware Specification	Yes	All experiments are run on a uniform hardware setup, an RTX 6000 GPU cluster (24 Gi B memory) using double precision. Tests were performed on NVIDIA RTX 6000 GPUs (24 Gi B RAM) with Py Torch s builtin timing routines...
Software Dependencies	No	We tune the following optimizer hyper-parameters and otherwise use the Py Torch default values... Tests were performed on NVIDIA RTX 6000 GPUs (24 Gi B RAM) with Py Torch s builtin timing routines...
Experiment Setup	Yes	We tune the following optimizer hyper-parameters and otherwise use the Py Torch default values: SGD: learning rate, momentum; Adam: learning rate; Hessian-free: type of curvature matrix (Hessian or GGN), damping, whether to adapt damping over time (yes or no), maximum number of CG iterations; ENGD: damping, factor of the exponential moving average applied to the Gramian, initialization of the Gramian (zero or identity matrix); ENGD (Woodbury): damping, learning rate (when fixed); SPRING: damping, momentum, learning rate (when fixed); Randomized: damping, learning rate (when fixed), sketch size. We use random search from Weights & Biases to determine the hyper-parameters... We use an MLP five-layer architecture whose linear layers are Tanh-activated except for the final one: 5 64 64 48 48 1 MLP with D = 10 065 trainable parameters.