reproducibilityindex.ai

Explicit loss asymptotics in the gradient descent training of neural networks

Authors: Maksim Velikanov, Dmitry Yarotsky

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Figure 1 we illustrate this approach to the long-term loss evolution with several examples of target functions having different smoothness and dimension and, as a result, exhibiting different exponents. The solid lines show the numerically obtained values, while the dashed lines show the respective theoretical power-law asymptotics. In Figure 2 we compare theoretical and numerical NTK eigenvalue distributions for several dimensions d and data set sizes M. In Figure 3a we compare the theoretical and numerical eigenvalue distributions for several values of d and q.
Researcher Affiliation	Academia	Maksim Velikanov Skolkovo Institute of Science and Technology maksim.velikanov@skoltech.ru Dmitry Yarotsky Skolkovo Institute of Science and Technology d.yarotsky@skoltech.ru
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets	Yes	The data distributions µ are modeled as mixtures of 8 Gaussian distributions with random centers, and the data dimension is either d = 2 or d = 4. The dataset size is M = 10^4 (see Section A (SM) for further details of experiments). ... application to MNIST (see Figure 3).
Dataset Splits	No	The paper mentions a dataset size of M=10^4 and refers to supplementary material for details, but it does not specify any training, validation, or test splits in the provided text.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper mentions parameters like 'shallow network with width N = 3000' and refers to Section A (SM) for 'further details of experiments', which is not included in the provided text. It does not provide explicit hyperparameters such as learning rate, batch size, or optimizer settings in the main content.