Explicit loss asymptotics in the gradient descent training of neural networks
Authors: Maksim Velikanov, Dmitry Yarotsky
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 1 we illustrate this approach to the long-term loss evolution with several examples of target functions having different smoothness and dimension and, as a result, exhibiting different exponents. The solid lines show the numerically obtained values, while the dashed lines show the respective theoretical power-law asymptotics. In Figure 2 we compare theoretical and numerical NTK eigenvalue distributions for several dimensions d and data set sizes M. In Figure 3a we compare the theoretical and numerical eigenvalue distributions for several values of d and q. |
| Researcher Affiliation | Academia | Maksim Velikanov Skolkovo Institute of Science and Technology maksim.velikanov@skoltech.ru Dmitry Yarotsky Skolkovo Institute of Science and Technology d.yarotsky@skoltech.ru |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | Yes | The data distributions ยต are modeled as mixtures of 8 Gaussian distributions with random centers, and the data dimension is either d = 2 or d = 4. The dataset size is M = 10^4 (see Section A (SM) for further details of experiments). ... application to MNIST (see Figure 3). |
| Dataset Splits | No | The paper mentions a dataset size of M=10^4 and refers to supplementary material for details, but it does not specify any training, validation, or test splits in the provided text. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper mentions parameters like 'shallow network with width N = 3000' and refers to Section A (SM) for 'further details of experiments', which is not included in the provided text. It does not provide explicit hyperparameters such as learning rate, batch size, or optimizer settings in the main content. |