Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Tangent Kernel Analysis of Deep Narrow Neural Networks

Authors: Jongmin Lee, Joo Young Choi, Ernest K Ryu, Albert No

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we experimentally demonstrate the invariance of the scaled NTK and the trainability of deep neural networks. The code is provided as supplementary material.
Researcher Affiliation Academia 1Department of Mathematical Sciences, Seoul National University, Seoul, Korea 2Department of Electronic and Electrical Engineering, Hongik University, Seoul, Korea.
Pseudocode No The paper contains detailed mathematical derivations and proofs but does not include any pseudocode or algorithm blocks.
Open Source Code Yes The code is provided as supplementary material.
Open Datasets Yes Next, we demonstrate the empirical trainability of the deep narrow networks on the MNIST dataset.
Dataset Splits No The paper mentions training and testing on the MNIST dataset, but it does not specify any explicit validation dataset splits (e.g., percentages, sample counts, or predefined splits).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper states that 'The code is provided as supplementary material' but does not list any specific software dependencies with version numbers.
Experiment Setup Yes We train L-layer MLPs with din = 784 and dout = 10 using the quadratic loss with one-hot vectors as targets. To establish a point of comparison, we attempt to train a 1000-layer MLP with the typical Kaiming He uniform initialization (He et al., 2015). We tuned the learning rate via a grid search from 0.00001 to 1.0, but the network was untrainable, as one would expect based on the prior ๏ฌndings of (He & Sun, 2015; Srivastava et al., 2015; Huang et al., 2020).