Neural Tangent Kernel: Convergence and Generalization in Neural Networks

Authors: Arthur Jacot, Franck Gabriel, Clement Hongler

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinite-width limit. In the following numerical experiments, fully connected ANNs of various widths are compared to the theoretical infinite-width limit.
Researcher Affiliation Academia Arthur Jacot Ecole Polytechnique F ed erale de Lausanne arthur.jacot@netopera.net Franck Gabriel Imperial College London and Ecole Polytechnique F ed erale de Lausanne franckrgabriel@gmail.com Cl ement Hongler Ecole Polytechnique F ed erale de Lausanne clement.hongler@gmail.com
Pseudocode No No pseudocode or algorithm blocks are provided in the paper.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We now illustrate our result on the MNIST dataset of handwritten digits made up of grayscale images of dimension 28 28, yielding a dimension of n0 = 784.
Dataset Splits No The paper mentions using the MNIST dataset and an artificial dataset, but does not specify any training, validation, or test dataset splits or cross-validation methodology.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for experiments are mentioned in the paper.
Software Dependencies No The paper does not provide specific software dependencies or version numbers for any libraries or frameworks used in the experiments.
Experiment Setup Yes In our numerical experiments, we take β = 0.1 and use a learning rate of 1.0, which is larger than usual, see Section 6. This gives a behaviour similar to that of a classical network of width 100 with a learning rate of 0.01. After 200 steps of gradient descent with learning rate 1.0 (i.e. at t = 200). for 1000 steps with learning rate 1.0