Information Bottleneck Analysis of Deep Neural Networks via Lossy Compression
Authors: Ivan Butakov, Alexander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov, Kirill Andreev
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach leverages the stochastic NN method proposed by Goldfeld et al. (2019) and incorporates a compression step to overcome the obstacles associated with high dimensionality. In other words, we estimate the MI between the compressed representations of high-dimensional random vectors. The proposed method is supported by both theoretical and practical justifications. Notably, we demonstrate the accuracy of our estimator through synthetic experiments featuring predefined MI values and comparison with MINE (Belghazi et al., 2018). Finally, we perform IB analysis on a close-to-real-scale convolutional DNN, which reveals new features of the MI dynamics. |
| Researcher Affiliation | Academia | Ivan Butakov ,1,2,3, Alexander Tolmachev1,2, Sofia Malanchuk1,2, Anna Neopryatnaya1,2, Alexey Frolov1, Kirill Andreev1 1Skolkovo Institute of Science and Technology; 2Moscow Institute of Physics and Technology; 3Sirius University of Science and Technology; |
| Pseudocode | Yes | Algorithm 1 Measure mutual information estimation quality on high-dimensional synthetic datasets ... Algorithm 2 Estimate information flow in the neural network during training |
| Open Source Code | Yes | Due to the high complexity of the used f and g, we do not define these functions in the main text; instead, we refer to the source code published along with the paper (Butakov et al.). ... Ivan Butakov, Aleksander Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov, and Kirill Andreev. Package for information-theoretic data analysis. URL https://github.com/VanessB/Information-v3. |
| Open Datasets | Yes | The experiment with convolutional DNN classifier of the MNIST handwritten digits dataset (Le Cun et al., 2010) is performed. |
| Dataset Splits | No | The paper mentions training data and a convolutional classifier but does not explicitly detail training, validation, and test splits (e.g., specific percentages or sample counts for each split). |
| Hardware Specification | Yes | We train our network with a learning rate of 10 5 using the Nvidia Titan RTX. |
| Software Dependencies | No | The paper mentions optimizers like Adam and libraries such as Leaky ReLU but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We train our network with a learning rate of 10 5 using the Nvidia Titan RTX. We use dlatent X = dlatent Li = 4. For other hyperparameters, we refer to Section F of the Appendix and to the source code (Butakov et al.). ... The autoencoders are trained via Adam ... with a batch size 5 103, a learning rate 10 3 and MAE loss for 2 103 epochs. |