A generalized neural tangent kernel for surrogate gradient learning

Authors: Luke Eilers, Raoul-Martin Memmesheimer, Sven Goedeke

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Further, we illustrate our findings with numerical experiments. Finally, we numerically compare SGL in networks with sign activation function and finite width to kernel regression with the surrogate gradient NTK; the results confirm that the surrogate gradient NTK provides a good characterization of SGL. 3 Numerical experiments We numerically illustrate the divergence of the analytic NTK, Θerfm, shown in Section 2.3 and the convergence of the SG-NTK in the infinite-width limit, ˆI(L) I(L), at initialization and during training shown in Section 2.4. Simultaneously, we visualize the convergence of the analytic SG-NTK, Ierfm Isign. We consider a regression problem on the unit sphere S1 = {x R2 : x = 1} with |X| = 15 training points, which is shown in Figure B.1, and train 10 fully connected feedforward networks with two hidden layers, and activation function erfm for t = 10000 time steps and with MSE loss.
Researcher Affiliation Academia Luke Eilers Department of Physiology, University of Bern, Switzerland Institute for Applied Mathematics, University of Bonn, Germany luke.eilers@unibe.ch Raoul-Martin Memmesheimer Institute of Genetics, University of Bonn, Germany rm.memmesheimer@uni-bonn.de Sven Goedeke Bernstein Center Freiburg, University of Freiburg, Germany Institute of Genetics, University of Bonn, Germany sven.goedeke@bcf.uni-freiburg.de
Pseudocode No The paper describes methods and derivations using mathematical equations and textual explanations but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code Yes For the implementation of the NTK and SG-NTK we use the JAX package [Bradbury et al., 2018] and Neural Tangents package [Novak et al., 2020, 2022, Han et al., 2022, Sohl-Dickstein et al., 2020, Hron et al., 2020] with modifications. (And confirmed by the checklist: Yes, the code is provided in the supplementary material with instructions to reproduce the experiments and figures.)
Open Datasets No We consider a regression problem on the unit sphere S1 = {x R2 : x = 1} with |X| = 15 training points, which is shown in Figure B.1
Dataset Splits No We consider a regression problem on the unit sphere S1 = {x R2 : x = 1} with |X| = 15 training points, which is shown in Figure B.1, and train 10 fully connected feedforward networks with two hidden layers, and activation function erfm for t = 10000 time steps and with MSE loss. There's no mention of a validation split.
Hardware Specification Yes Computations were done using an Intel Core i7-1355U CPU and 16 GB RAM.
Software Dependencies No For the implementation of the NTK and SG-NTK we use the JAX package [Bradbury et al., 2018] and Neural Tangents package [Novak et al., 2020, 2022, Han et al., 2022, Sohl-Dickstein et al., 2020, Hron et al., 2020] with modifications. While packages are named, specific version numbers (e.g., JAX version X.Y.Z) are not explicitly stated.
Experiment Setup Yes We consider a regression problem on the unit sphere S1 = {x R2 : x = 1} with |X| = 15 training points, which is shown in Figure B.1, and train 10 fully connected feedforward networks with two hidden layers, and activation function erfm for t = 10000 time steps and with MSE loss. We plot empirical and analytic NTKs of 10 networks for different hidden layer widths n and activation functions erfm. The kernels are plotted at initialization and after gradient descent training with t = 1e4 time steps, learning rate η = 0.1, and MSE error. All networks are initialized with σw = 1, σb = 0.1.