A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel

Authors: Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate the quality of this approximation for various uses across a range of settings.
Researcher Affiliation Academia Computer Science Department, University of British Columbia, Vancouver, Canada Alberta Machine Intelligence Institute, Edmonton, Canada.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Lastly, to help the community better analyze the properties of NNs and their training dynamics, and avoid wasting computation by redoing this work, we plan to share computed p NTKs for all the mentioned architectures and widths...
Open Datasets Yes We focus on data from CIFAR-10 (Krizhevsky, 2009).
Dataset Splits No The paper mentions using CIFAR-10 for training but does not provide specific details on validation splits (e.g., percentages or sample counts).
Hardware Specification Yes All models are trained for 200 epochs, using stochastic gradient descent (SGD), on 32GB NVIDIA V100 GPUs.
Software Dependencies No The paper does not provide specific version numbers for software dependencies such as libraries or frameworks used in the experiments.
Experiment Setup Yes A constant batch size of 128 was used across all different networks and different dataset sizes used for training. The learning rate for all networks was also fixed to 0.1. However, not all networks were trainable with this fixed learning rate, as the gradients would sometimes blow up and give Na N training loss, typically for the largest width of each mentioned architecture. In those cases, we decreased the learning rate to 0.01 to train the networks. ... a weight decay of 0.0001 along with a momentum of 0.9 for SGD is used.