Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Authors: Arthur Jacot, Franck Gabriel, Clement Hongler
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally we study the NTK numerically, observe its behavior for wide networks, and compare it to the infinite-width limit. In the following numerical experiments, fully connected ANNs of various widths are compared to the theoretical infinite-width limit. |
| Researcher Affiliation | Academia | Arthur Jacot Ecole Polytechnique F ed erale de Lausanne arthur.jacot@netopera.net Franck Gabriel Imperial College London and Ecole Polytechnique F ed erale de Lausanne franckrgabriel@gmail.com Cl ement Hongler Ecole Polytechnique F ed erale de Lausanne clement.hongler@gmail.com |
| Pseudocode | No | No pseudocode or algorithm blocks are provided in the paper. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We now illustrate our result on the MNIST dataset of handwritten digits made up of grayscale images of dimension 28 28, yielding a dimension of n0 = 784. |
| Dataset Splits | No | The paper mentions using the MNIST dataset and an artificial dataset, but does not specify any training, validation, or test dataset splits or cross-validation methodology. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for experiments are mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for any libraries or frameworks used in the experiments. |
| Experiment Setup | Yes | In our numerical experiments, we take β = 0.1 and use a learning rate of 1.0, which is larger than usual, see Section 6. This gives a behaviour similar to that of a classical network of width 100 with a learning rate of 0.01. After 200 steps of gradient descent with learning rate 1.0 (i.e. at t = 200). for 1000 steps with learning rate 1.0 |