Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression

Authors: Ivan Butakov, Alexander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov, Kirill Andreev

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values and comparison with MINE (Belghazi et al., 2018). Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics.
Researcher Affiliation Academia Ivan Butakov ,1,2,3, Alexander Tolmachev1,2, Sofia Malanchuk1,2, Anna Neopryatnaya1,2, Alexey Frolov1, Kirill Andreev1 1Skolkovo Institute of Science and Technology; 2Moscow Institute of Physics and Technology; 3Sirius University of Science and Technology;
Pseudocode Yes Algorithm 1 Measure mutual information estimation quality on high-dimensional synthetic datasets ... Algorithm 2 Estimate information flow in the neural network during training
Open Source Code Yes Due to the high complexity of the used f and g, we do not define these functions in the main text; instead, we refer to the source code published along with the paper (Butakov et al.). ... Ivan Butakov, Aleksander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov, and Kirill Andreev. Package for information-theoretic data analysis. URL https://github.com/VanessB/Information-v3.
Open Datasets Yes The experiment with convolutional DNN classifier of the MNIST handwritten digits dataset (Le Cun et al., 2010) is performed.
Dataset Splits No The paper mentions training data and a convolutional classifier but does not explicitly detail training, validation, and test splits (e.g., specific percentages or sample counts for each split).
Hardware Specification Yes We train our network with a learning rate of 10 5 using the Nvidia Titan RTX.
Software Dependencies No The paper mentions optimizers like Adam and libraries such as Leaky ReLU but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We train our network with a learning rate of 10 5 using the Nvidia Titan RTX. We use dlatent X = dlatent Li = 4. For other hyperparameters, we refer to Section F of the Appendix and to the source code (Butakov et al.). ... The autoencoders are trained via Adam ... with a batch size 5 103, a learning rate 10 3 and MAE loss for 2 103 epochs.