Comparing Dynamics: Deep Neural Networks versus Glassy Systems

Authors: Marco Baity-Jesi, Levent Sagun, Mario Geiger, Stefano Spigler, Gerard Ben Arous, Chiara Cammarota, Yann LeCun, Matthieu Wyart, Giulio Biroli

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. The analysis is performed for several different architectures, see Sec. 3, varying from specific toy models to Res Nets (He et al., 2016) which are evaluated on popular datasets such as MNIST and CIFAR. We present our core results in two parts: time dependence of the loss function (Sec. 3.1), and identifying different regimes through the two-point correlation function (Sec. 3.2).
Researcher Affiliation Collaboration 1Department of Chemistry, Columbia University, New York, NY 10027, USA 2Institut de Physique Th eorique, Universit e Paris Saclay, CEA, CNRS, F-91191 Gif-sur-Yvette, France 3EPFL, Lausanne, Switzerland 4Courant Institute of Mathematical Sciences, New York University, New York, USA 5Kings College London, Department of Mathematics, Strand, London WC2R 2LS, United Kingdom 6Center for Data Science, New York University, New York, USA 7Facebook AI Research, Facebook Inc., New York, USA 8Laboratoire de Physique Statistique, Ecole Normale Sup erieure, CNRS, PSL Research University, Sorbonne Universit es, 75005 Paris, France.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in the paper. It thanks individuals for providing "the initial version of the code that we used in our numerical simulations" but does not state that their own implementation is publicly available.
Open Datasets Yes The analysis is performed for several different architectures, see Sec. 3, varying from specific toy models to Res Nets (He et al., 2016) which are evaluated on popular datasets such as MNIST and CIFAR.
Dataset Splits No The paper mentions using "train loss" and "test loss" in its figures, implying data splits, but it does not specify the exact percentages or sample counts for training, validation, or test sets, nor does it refer to standard predefined splits for the datasets used.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models, or cloud computing specifications.
Software Dependencies Yes All networks are initialized in the standard procedures of the Py Torch library (version 0.3.0).
Experiment Setup Yes The training is carried out by SGD that takes a single learning rate that remains unchanged until the end of the computation. The batch size B and learning rate α are specified under each plot. (e.g., Figure 2(a) caption: "Toy Model on CIFAR-10 m = 104, B = 100, α = 0.1.")