Entropy and mutual information in models of deep neural networks

Authors: Marylou Gabrié, Andre Manoel, Clément Luneau, jean barbier, Nicolas Macris, Florent Krzakala, Lenka Zdeborová

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose an experiment framework with generative models of synthetic datasets, on which we train deep neural networks with a weight constraint designed so that the assumption in (i) is verified during learning. We study the behavior of entropies and mutual informations throughout learning and conclude that, in the proposed setting, the relationship between compression and generalization remains elusive. We present a series of experiments both aiming at further validating the replica estimator and leveraging its power in noteworthy applications.
Researcher Affiliation Collaboration Marylou Gabrié 1, Andre Manoel2,3, Clément Luneau4, Jean Barbier1,4,5, Nicolas Macris4, Florent Krzakala1,6,7 and Lenka Zdeborová3,6 1Laboratoire de Physique Statistique, École Normale Supérieure, PSL University 2Parietal Team, INRIA, CEA, Université Paris-Saclay & Owkin Inc., New York 3Institut de Physique Théorique, CEA, CNRS, Université Paris-Saclay 4Laboratoire de Théorie des Communications, École Polytechnique Fédérale de Lausanne 5International Center for Theoretical Physics, Trieste, Italy 6Department of Mathematics, Duke University, Durham NC 7Sorbonne Universités & Light On Inc., Paris
Pseudocode No The paper describes mathematical formulas and procedures in text but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Moreover, a user-friendly Python package is provided [13], which performs the computation for different choices of prior P0, activations ϕℓand spectra λWℓ. We provide a second Python package [50] to implement in Keras learning experiments on synthetic datasets, using USVlayers and interfacing the first Python package [13] for replica computations. [13] dnner: Deep Neural Networks Entropy with Replicas, Python library. https://github.com/ sphinxteam/dnner. [50] lsd: Learning with Synthetic Data, Python library. https://github.com/marylou-gabrie/ learning-synthetic-data.
Open Datasets No The multi-layer model presented above can be leveraged to simulate two prototypical settings of deep supervised learning on synthetic datasets amenable to the replica tractable computation of entropies and mutual informations. In Section 3.2 of the Supplementary Material [12]we train a neural network with USV-layers on a simple real-world dataset (MNIST), showing that these layers can learn to represent complex functions despite their restriction. The primary experiments are on synthetic datasets generated by the authors, for which public access information (link, DOI, or formal citation with author/year) is not provided. While MNIST is mentioned as being used in the supplementary material, it is not the main dataset in the presented experiments, and no specific access information for it is provided in the main text.
Dataset Splits No After generating a train and test set in this manner, we perform the training of a deep neural network, the student, on the synthetic dataset. The sizes of the training and testing sets are taken equal and scale typically as a few hundreds times the size of the input layer. The paper mentions training and testing sets but does not specify a separate validation set or provide specific percentages/counts for data splits.
Hardware Specification Yes We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.
Software Dependencies No We provide a second Python package [50] to implement in Keras learning experiments on synthetic datasets... The paper mentions Python packages and Keras but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No Hence for all experiments we use plain stochastic gradient descent (SGD) with constant learning rates, without momentum and without any explicit form of regularization. We train a student network of three USV-layers, plus one fully connected unconstrained layer X T1 T2 T3 ˆY on the regression task, using plain SGD for the MSE loss ( ˆY Y )2. We compare two 5-layers recognition models with 4 USVlayers plus one unconstrained, of sizes 500-1000-500-250-100-2, and activations either linear-Re LU-linear-Re LU-softmax (top row of Figure 4) or linear-hardtanhlinear-hardtanh-softmax (bottom row). While the paper describes the optimizer, loss functions, and network architectures and sizes, it does not provide specific hyperparameter values such as the exact learning rate, batch size, or total number of epochs.