Estimating Information Flow in Deep Neural Networks

Authors: Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results verify this connection. In Section 5.1 we experimentally demonstrate that, in some cases, I(X; Tℓ) exhibits compression during training of noisy DNNs. We trained four-layer convolutional neural networks (CNNs) on MNIST (Le Cun et al., 1999). ... We measured their performance on the validation set and characterized the cosine similarities between their internal representations... The experiments demonstrate that I(X; Tℓ) compression in noisy DNNs is driven by clustering of internal representations, and that deterministic DNNs cluster samples as well.
Researcher Affiliation Collaboration Ziv Goldfeld 1 2 Ewout van den Berg 2 3 Kristjan Greenewald 2 3 Igor Melnyk 2 3 Nam Nguyen 2 3 Brian Kingsbury 2 3 Yury Polyanskiy 1 2 1Massachusetts Institute of Technology 2MIT-IBM Watson AI Lab 3IBM Research.
Pseudocode No No direct match. The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Code to replicate the experiments in this paper is in preparation, and Goldfeld et al. (2018) will be updated when it is available.
Open Datasets Yes we trained four-layer convolutional neural networks (CNNs) on MNIST (Le Cun et al., 1999).
Dataset Splits No No direct match. The paper mentions using a "validation set" and reports "MNIST validation errors", but it does not specify the exact percentages or sample counts for training, validation, or test splits. It also does not explicitly reference predefined splits with a citation specifically for the split methodology.
Hardware Specification No No direct match. The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No No direct match. The paper does not specify the version numbers for any software dependencies, such as programming languages, libraries (e.g., PyTorch, TensorFlow), or other tools used in their experiments.
Experiment Setup Yes The CNNs used different internal noise levels (including β = 0) and one used dropout instead of additive noise. Let σ = tanh, β = 0.01 and X = X 1 X1, with X 1 = { 3, 1, 1} and X1 = {3}, labeled 1 and 1, respectively. We train the neuron using mean squared loss and gradient descent with learning rate 0.01 to illustrate I X; T(k) trends. The FCN was tested with tanh and Re LU nonlinearities as well as a linear model. Fig. 5(a) presents results for the tanh SZT model with β = 0.005 (test classification accuracy 97%).