Neural (Tangent Kernel) Collapse

Authors: Mariia Seleznova, Dana Weitzner, Raja Giryes, Gitta Kutyniok, Hung-Hsu Chou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide large-scale numerical experiments on three common DNN architectures and three benchmark datasets to support our theory.
Researcher Affiliation Academia 1Ludwig-Maximilians-Universität München 2Tel Aviv University
Pseudocode No The paper contains mathematical derivations and theoretical analyses but does not include any explicitly labeled
Open Source Code Yes Source code to reproduce the results is available in the project s Git Hub repository.
Open Datasets Yes Our datasets are MNIST [35], Fashion MNIST [51] and CIFAR10 [34].
Dataset Splits No The paper does not provide specific details on training/validation/test dataset splits, such as percentages or sample counts. It mentions training for
Hardware Specification Yes We executed the numerical experiments mainly on NVIDIA Ge Force RTX 3090 Ti GPUs, each model was trained on a single GPU.
Software Dependencies No We use JAX [8] and Flax (neural network library for JAX) [25] to implement all the DNN architectures and the training routines. While these software components are mentioned, specific version numbers are not provided for JAX or Flax.
Experiment Setup Yes We use SGD with Nesterov momentum 0.9 and weight decay 5e-4. Every model is trained for 400 epochs with batches of size 120. To be consistent with the theory, we balance the batches exactly. We train every model with a set of initial learning rates spaced logarithmically in the range η [10-4, 100.25]. The learning rate is divided by 10 every 120 epochs.