How Does Information Bottleneck Help Deep Learning?

Authors: Kenji Kawaguchi, Zhun Deng, Xu Ji, Jiaoyang Huang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theory proves that controlling information bottleneck is one way to control generalization errors in deep learning, although it is not the only or necessary way. We investigate the merit of our new mathematical findings with experiments across a range of architectures and learning settings.
Researcher Affiliation Academia 1NUS 2Columbia University 3Mila 4University of Pennsylvania.
Pseudocode No The paper does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly available at: https://github.com/xu-ji/information-bottleneck
Open Datasets Yes trained 540 deep neural networks on CIFAR10 without explicitly constraining MI", "We conducted experiments on the MNIST and Fashion MNIST datasets.
Dataset Splits No The paper mentions 'test points' and 'test set' in relation to 'training points' and 'training set' for various datasets (2D clustered, MNIST, Fashion MNIST, CIFAR10), but does not explicitly specify a validation dataset split.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instance details) used to run its experiments.
Software Dependencies No The paper mentions the use of 'SWAG' and 'kernel density estimation' but does not specify version numbers for these or any other software libraries or dependencies used in the experiments.
Experiment Setup Yes 3 weight decay rates (0, 0.01, 0.1), 3 dataset draws, 3 random seeds... Models were trained for 300 iterations with a learning rate of ηθ = 1e 2... 3 weight decay rates (1e-3, 1e-4, 1e-5), 3 batch sizes (64, 128, 1024). Models were trained for 200 epochs with SGD and a learning rate of 1e 2.