A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel
Authors: Mohamad Amin Mohamadi, Wonho Bae, Danica J. Sutherland
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate the quality of this approximation for various uses across a range of settings. |
| Researcher Affiliation | Academia | Computer Science Department, University of British Columbia, Vancouver, Canada Alberta Machine Intelligence Institute, Edmonton, Canada. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Lastly, to help the community better analyze the properties of NNs and their training dynamics, and avoid wasting computation by redoing this work, we plan to share computed p NTKs for all the mentioned architectures and widths... |
| Open Datasets | Yes | We focus on data from CIFAR-10 (Krizhevsky, 2009). |
| Dataset Splits | No | The paper mentions using CIFAR-10 for training but does not provide specific details on validation splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All models are trained for 200 epochs, using stochastic gradient descent (SGD), on 32GB NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as libraries or frameworks used in the experiments. |
| Experiment Setup | Yes | A constant batch size of 128 was used across all different networks and different dataset sizes used for training. The learning rate for all networks was also fixed to 0.1. However, not all networks were trainable with this fixed learning rate, as the gradients would sometimes blow up and give Na N training loss, typically for the largest width of each mentioned architecture. In those cases, we decreased the learning rate to 0.01 to train the networks. ... a weight decay of 0.0001 along with a momentum of 0.9 for SGD is used. |