Estimating Information Flow in Deep Neural Networks
Authors: Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results verify this connection. In Section 5.1 we experimentally demonstrate that, in some cases, I(X; Tℓ) exhibits compression during training of noisy DNNs. We trained four-layer convolutional neural networks (CNNs) on MNIST (Le Cun et al., 1999). ... We measured their performance on the validation set and characterized the cosine similarities between their internal representations... The experiments demonstrate that I(X; Tℓ) compression in noisy DNNs is driven by clustering of internal representations, and that deterministic DNNs cluster samples as well. |
| Researcher Affiliation | Collaboration | Ziv Goldfeld 1 2 Ewout van den Berg 2 3 Kristjan Greenewald 2 3 Igor Melnyk 2 3 Nam Nguyen 2 3 Brian Kingsbury 2 3 Yury Polyanskiy 1 2 1Massachusetts Institute of Technology 2MIT-IBM Watson AI Lab 3IBM Research. |
| Pseudocode | No | No direct match. The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Code to replicate the experiments in this paper is in preparation, and Goldfeld et al. (2018) will be updated when it is available. |
| Open Datasets | Yes | we trained four-layer convolutional neural networks (CNNs) on MNIST (Le Cun et al., 1999). |
| Dataset Splits | No | No direct match. The paper mentions using a "validation set" and reports "MNIST validation errors", but it does not specify the exact percentages or sample counts for training, validation, or test splits. It also does not explicitly reference predefined splits with a citation specifically for the split methodology. |
| Hardware Specification | No | No direct match. The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | No direct match. The paper does not specify the version numbers for any software dependencies, such as programming languages, libraries (e.g., PyTorch, TensorFlow), or other tools used in their experiments. |
| Experiment Setup | Yes | The CNNs used different internal noise levels (including β = 0) and one used dropout instead of additive noise. Let σ = tanh, β = 0.01 and X = X 1 X1, with X 1 = { 3, 1, 1} and X1 = {3}, labeled 1 and 1, respectively. We train the neuron using mean squared loss and gradient descent with learning rate 0.01 to illustrate I X; T(k) trends. The FCN was tested with tanh and Re LU nonlinearities as well as a linear model. Fig. 5(a) presents results for the tanh SZT model with β = 0.005 (test classification accuracy 97%). |