Information Bottleneck: Exact Analysis of (Quantized) Neural Networks

Authors: Stephan Sloth Lorenzen, Christian Igel, Mads Nielsen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our study confirms that different ways of binning when computing the MI lead to qualitatively different results, either supporting or refusing IB conjectures. To resolve the controversy, we study the IB principle in settings where MI is non-trivial and can be computed exactly. We monitor the dynamics of quantized neural networks, that is, we discretize the whole deep learning system so that no approximation is required when computing the MI. This allows us to quantify the information flow without measurement errors. In this setting, we observed a fitting phase for all layers and a compression phase for the output layer in all experiments; the compression in the hidden layers was dependent on the type of activation function.
Researcher Affiliation Academia Stephan S. Lorenzen, Christian Igel & Mads Nielsen Department of Computer Science University of Copenhagen {lorenzen,igel,madsn}@di.ku.dk
Pseudocode No The paper does not include any pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is based on Tensorflow (Abadi et al., 2016). Experiments were run on an Intel Core i9-9900 CPU with 8 × 3.10GHz cores and a NVIDIA Quadro P2200 GPU.4. Available at: https://github.com/StephanLorenzen/Exact_IBAnalysis_In_QNNs
Open Datasets Yes The setting of Shwartz-Ziv & Tishby (2017) and Saxe et al. (2018) using the network shown in Figure 1a with either TANH or RELU activations, fitted on the same synthetic data set consisting of |D| = 2^12 12-bit binary input patterns with balanced binary output (Section 4.1). Learning a two-dimensional representation for the MNIST handwritten digits, where a fully connected network similar to the one in the previous setting using a bottleneck architecture with RELU activations for all hidden layers is trained (Section 4.2).
Dataset Splits Yes As in previous studies (Shwartz-Ziv & Tishby, 2017; Saxe et al., 2018), we used an 80%/20% training/test split to monitor fitting, while computing the MI based on the entire data set (training and test data).
Hardware Specification Yes Experiments were run on an Intel Core i9-9900 CPU with 8 × 3.10GHz cores and a NVIDIA Quadro P2200 GPU.
Software Dependencies No Our implementation is based on Tensorflow (Abadi et al., 2016). The paper mentions Tensorflow but does not specify a version number for it or any other software dependencies.
Experiment Setup Yes The networks were trained using mini batches of size 256 and the Adam optimizer with a learning rate of 10^-4. Weights of layer T were initialized using a truncated normal with mean 0 and standard deviation 1/sqrt(d_T).