Explicit loss asymptotics in the gradient descent training of neural networks

Authors: Maksim Velikanov, Dmitry Yarotsky

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 1 we illustrate this approach to the long-term loss evolution with several examples of target functions having different smoothness and dimension and, as a result, exhibiting different exponents. The solid lines show the numerically obtained values, while the dashed lines show the respective theoretical power-law asymptotics. In Figure 2 we compare theoretical and numerical NTK eigenvalue distributions for several dimensions d and data set sizes M. In Figure 3a we compare the theoretical and numerical eigenvalue distributions for several values of d and q.
Researcher Affiliation Academia Maksim Velikanov Skolkovo Institute of Science and Technology maksim.velikanov@skoltech.ru Dmitry Yarotsky Skolkovo Institute of Science and Technology d.yarotsky@skoltech.ru
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes The data distributions ยต are modeled as mixtures of 8 Gaussian distributions with random centers, and the data dimension is either d = 2 or d = 4. The dataset size is M = 10^4 (see Section A (SM) for further details of experiments). ... application to MNIST (see Figure 3).
Dataset Splits No The paper mentions a dataset size of M=10^4 and refers to supplementary material for details, but it does not specify any training, validation, or test splits in the provided text.
Hardware Specification No The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper does not list any specific software dependencies with version numbers.
Experiment Setup No The paper mentions parameters like 'shallow network with width N = 3000' and refers to Section A (SM) for 'further details of experiments', which is not included in the provided text. It does not provide explicit hyperparameters such as learning rate, batch size, or optimizer settings in the main content.