Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The asymptotic spectrum of the Hessian of DNN throughout training
Authors: Arthur Jacot, Franck Gabriel, Clement Hongler
ICLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All our numerical experiments are done with rectangular networks (with n1 = ... = n L 1) and match closely the predictions for the sequential limit. Figure 1: Comparison of the theoretical prediction of Corollary 1 for the expectation of the first 4 moments (colored lines) to the empirical average over 250 trials (black crosses) for a rectangular network with two hidden layers of finite widths n1 = n2 = 5000 (L = 3) with the smooth Re LU (left) and the normalized smooth Re LU (right), for the MSE loss on scaled down 14x14 MNIST with N = 256. |
| Researcher Affiliation | Academia | Arthur Jacot, Franck Gabriel & Cl ement Hongler Chair of Statistical Field Theory Ecole Polytechnique F ed erale de Lausanne EMAIL |
| Pseudocode | No | The paper contains mathematical derivations and proofs but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | Figure 1: Comparison of the theoretical prediction of Corollary 1 for the expectation of the first 4 moments (colored lines) to the empirical average over 250 trials (black crosses) for a rectangular network with two hidden layers of finite widths n1 = n2 = 5000 (L = 3) with the smooth Re LU (left) and the normalized smooth Re LU (right), for the MSE loss on scaled down 14x14 MNIST with N = 256. |
| Dataset Splits | No | The paper mentions using a dataset (MNIST with N=256) but does not provide specific training, validation, or test split percentages or sample counts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running its numerical experiments. |
| Software Dependencies | No | The paper does not provide any specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | All parameters are initialized as iid N(0, 1) Gaussians. In our experiments, we take β = 0.1. The network is trained with respect to the cost functional: i=1 ci (f(xi)) , for strictly convex ci, summing over a finite dataset x1, . . . , x N Rn0 of size N. The parameters are then trained with gradient descent on the composition C F (L), which defines the usual loss surface of neural networks. Figure 1: ...for a rectangular network with two hidden layers of finite widths n1 = n2 = 5000 (L = 3) with the smooth Re LU (left) and the normalized smooth Re LU (right), for the MSE loss on scaled down 14x14 MNIST with N = 256. |