reproducibilityindex.ai

The asymptotic spectrum of the Hessian of DNN throughout training

Authors: Arthur Jacot, Franck Gabriel, Clement Hongler

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	All our numerical experiments are done with rectangular networks (with n1 = ... = n L 1) and match closely the predictions for the sequential limit. Figure 1: Comparison of the theoretical prediction of Corollary 1 for the expectation of the first 4 moments (colored lines) to the empirical average over 250 trials (black crosses) for a rectangular network with two hidden layers of finite widths n1 = n2 = 5000 (L = 3) with the smooth Re LU (left) and the normalized smooth Re LU (right), for the MSE loss on scaled down 14x14 MNIST with N = 256.
Researcher Affiliation	Academia	Arthur Jacot, Franck Gabriel & Cl ement Hongler Chair of Statistical Field Theory Ecole Polytechnique F ed erale de Lausanne {arthur.jacot,franck.grabriel,clement.hongler}@epfl.ch
Pseudocode	No	The paper contains mathematical derivations and proofs but does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	Figure 1: Comparison of the theoretical prediction of Corollary 1 for the expectation of the first 4 moments (colored lines) to the empirical average over 250 trials (black crosses) for a rectangular network with two hidden layers of finite widths n1 = n2 = 5000 (L = 3) with the smooth Re LU (left) and the normalized smooth Re LU (right), for the MSE loss on scaled down 14x14 MNIST with N = 256.
Dataset Splits	No	The paper mentions using a dataset (MNIST with N=256) but does not provide specific training, validation, or test split percentages or sample counts.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running its numerical experiments.
Software Dependencies	No	The paper does not provide any specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup	Yes	All parameters are initialized as iid N(0, 1) Gaussians. In our experiments, we take β = 0.1. The network is trained with respect to the cost functional: i=1 ci (f(xi)) , for strictly convex ci, summing over a ﬁnite dataset x1, . . . , x N Rn0 of size N. The parameters are then trained with gradient descent on the composition C F (L), which deﬁnes the usual loss surface of neural networks. Figure 1: ...for a rectangular network with two hidden layers of ﬁnite widths n1 = n2 = 5000 (L = 3) with the smooth Re LU (left) and the normalized smooth Re LU (right), for the MSE loss on scaled down 14x14 MNIST with N = 256.