Characterizing the spectrum of the NTK via a power series expansion

Authors: Michael Murray, Hui Jin, Benjamin Bowman, Guido Montufar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 1 we empirically validate our theory by computing the spectrum of the NTK on both Caltech101 (Li et al., 2022) and isotropic Gaussian data for feedforward networks.
Researcher Affiliation Academia Department of Mathematics, UCLA, CA, USA Department of Statistics, UCLA, CA, USA Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany [mmurray,huijin,benbowman314,montufar]@math.ucla.edu
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Reproducibility Statement: To ensure reproducibility, we make the code public at https://github.com/bbowman223/data_ntk.
Open Datasets Yes In Figure 1 we empirically validate our theory by computing the spectrum of the NTK on both Caltech101 (Li et al., 2022) and isotropic Gaussian data for feedforward networks.
Dataset Splits No The paper mentions using a 'batch size of n = 200' and plotting 'the first 100 eigenvalues' but does not provide specific details on how the datasets (Caltech101, isotropic Gaussian data) were split into training, validation, or test sets.
Hardware Specification No The paper states 'We use the functorch module in Py Torch (Paszke et al., 2019)' and mentions running experiments, but does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for these experiments.
Software Dependencies No The paper mentions 'functorch module in Py Torch (Paszke et al., 2019)' but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup Yes For the feedforward architectures we consider networks of depth 2 and 5 with the width of all layers being set at 500. With regard to the activation function we test linear, Re LU and Tanh, and in terms of initialization we use Kaiming uniform (He et al., 2015)... For the convolutional architectures we again consider depths 2 and 5, with each layer consisting of 100 channels with the filter size set to 5x5... The batch sized is fixed at 200 and we plot only the first 100 normalized eigenvalues. Each experiment was repeated 10 times.