How Does Information Bottleneck Help Deep Learning?
Authors: Kenji Kawaguchi, Zhun Deng, Xu Ji, Jiaoyang Huang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theory proves that controlling information bottleneck is one way to control generalization errors in deep learning, although it is not the only or necessary way. We investigate the merit of our new mathematical findings with experiments across a range of architectures and learning settings. |
| Researcher Affiliation | Academia | 1NUS 2Columbia University 3Mila 4University of Pennsylvania. |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available at: https://github.com/xu-ji/information-bottleneck |
| Open Datasets | Yes | trained 540 deep neural networks on CIFAR10 without explicitly constraining MI", "We conducted experiments on the MNIST and Fashion MNIST datasets. |
| Dataset Splits | No | The paper mentions 'test points' and 'test set' in relation to 'training points' and 'training set' for various datasets (2D clustered, MNIST, Fashion MNIST, CIFAR10), but does not explicitly specify a validation dataset split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud instance details) used to run its experiments. |
| Software Dependencies | No | The paper mentions the use of 'SWAG' and 'kernel density estimation' but does not specify version numbers for these or any other software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | 3 weight decay rates (0, 0.01, 0.1), 3 dataset draws, 3 random seeds... Models were trained for 300 iterations with a learning rate of ηθ = 1e 2... 3 weight decay rates (1e-3, 1e-4, 1e-5), 3 batch sizes (64, 128, 1024). Models were trained for 200 epochs with SGD and a learning rate of 1e 2. |