Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization
Authors: Simone Bombari, Mohammad Hossein Amani, Marco Mondelli
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 1, we consider a 3-layer neural network with d = n1 = n2, and we plot λmin (K) as a function of d2, for three different values of N. The inputs are sampled from a standard Gaussian distribution, the activation function is the sigmoid σ(x) = (1 + e x) 1, and we set βl = 1 for all l [L]. We repeat the experiment 10 times, and report average and confidence interval at 1 standard deviation. The linear scaling of λmin (K) in d2 is in agreement with the result of Theorem 3.1. The code used to obtain the results of Figure 1 (and Figure 2 as well) is available at https://github.com/simone-bombari/smallest-eigenvalue-NTK/. In Figure 2, we give an illustrative example that 4-layer networks achieve 0 loss when the number of parameters is at least linear in the number of training samples, i.e., under minimum over-parameterization. To ease the experimental setup, we use a Re LU activation, with Adam optimizer. We initialize the network as in the setting of Theorem 3.1, picking βl = 1 for all l [L]. The inputs, as well as the targets, are sampled from a standard Gaussian distribution. The plot is averaged over 10 independent trials. |
| Researcher Affiliation | Academia | Institute of Science and Technology Austria (ISTA). Emails: EMAIL. EPFL, Switzerland. Email: EMAIL. |
| Pseudocode | No | The paper does not contain any sections explicitly labeled “Pseudocode” or “Algorithm”, nor does it present any structured algorithm blocks. |
| Open Source Code | Yes | The code used to obtain the results of Figure 1 (and Figure 2 as well) is available at https://github.com/simone-bombari/smallest-eigenvalue-NTK/. |
| Open Datasets | Yes | CIFAR-10 has N = 50000 images and roughly 106 parameters suffice to fit random labels [72]; furthermore, in order to fit random labels to a subset of 1.2 106 Image Net data points, 2.4 107 parameters are enough [72]. [...] (e.g., data with a Gaussian distribution, uniform on the sphere/hypercube, or obtained via a Generative Adversarial Network) |
| Dataset Splits | No | The paper mentions using |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments. It only mentions general setups like |
| Software Dependencies | No | The paper mentions using an |
| Experiment Setup | Yes | In Figure 1, we consider a 3-layer neural network with d = n1 = n2, and we plot λmin (K) as a function of d2, for three different values of N. The inputs are sampled from a standard Gaussian distribution, the activation function is the sigmoid σ(x) = (1 + e x) 1, and we set βl = 1 for all l [L]. We repeat the experiment 10 times, and report average and confidence interval at 1 standard deviation. [...] In Figure 2, we give an illustrative example that 4-layer networks achieve 0 loss [...] we use a Re LU activation, with Adam optimizer. We initialize the network as in the setting of Theorem 3.1, picking βl = 1 for all l [L]. The inputs, as well as the targets, are sampled from a standard Gaussian distribution. The plot is averaged over 10 independent trials. [...] the initialization θ0 is defined in (18) with γ = d3N 2 and η C(γNdn L 1) 1. |