Neural (Tangent Kernel) Collapse
Authors: Mariia Seleznova, Dana Weitzner, Raja Giryes, Gitta Kutyniok, Hung-Hsu Chou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide large-scale numerical experiments on three common DNN architectures and three benchmark datasets to support our theory. |
| Researcher Affiliation | Academia | 1Ludwig-Maximilians-Universität München 2Tel Aviv University |
| Pseudocode | No | The paper contains mathematical derivations and theoretical analyses but does not include any explicitly labeled |
| Open Source Code | Yes | Source code to reproduce the results is available in the project s Git Hub repository. |
| Open Datasets | Yes | Our datasets are MNIST [35], Fashion MNIST [51] and CIFAR10 [34]. |
| Dataset Splits | No | The paper does not provide specific details on training/validation/test dataset splits, such as percentages or sample counts. It mentions training for |
| Hardware Specification | Yes | We executed the numerical experiments mainly on NVIDIA Ge Force RTX 3090 Ti GPUs, each model was trained on a single GPU. |
| Software Dependencies | No | We use JAX [8] and Flax (neural network library for JAX) [25] to implement all the DNN architectures and the training routines. While these software components are mentioned, specific version numbers are not provided for JAX or Flax. |
| Experiment Setup | Yes | We use SGD with Nesterov momentum 0.9 and weight decay 5e-4. Every model is trained for 400 epochs with batches of size 120. To be consistent with the theory, we balance the batches exactly. We train every model with a set of initial learning rates spaced logarithmically in the range η [10-4, 100.25]. The learning rate is divided by 10 every 120 epochs. |