Slicing Mutual Information Generalization Bounds for Neural Networks
Authors: Kimia Nadjahi, Kristjan Greenewald, Rickard Brüel Gabrielsson, Justin Solomon
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically validate our results and achieve the computation of non-vacuous information-theoretic generalization bounds for neural networks, a task that was previously out of reach. |
| Researcher Affiliation | Collaboration | 1MIT 2MIT-IBM Watson AI Lab; IBM Research. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the code to reproduce the experiments2. 2Code is available here: https://github.com/ kimiandj/slicing_mi_generalization. |
| Open Datasets | Yes | We train fully-connected NNs to classify MNIST and CIFAR-10 datasets... We classify the Iris dataset (Fisher, 1936). |
| Dataset Splits | No | The paper mentions 'a random subset of MNIST with n = 1000 samples' for training and 'a test dataset of 10 000 samples' for evaluation, as well as 'We compute the test error on 20n/80 observations.' for logistic regression, indicating train/test splits. However, it does not explicitly define a separate validation split or its size/percentage. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Adam optimizer (Kingma & Ba, 2017) with default parameters (on PyTorch)' but does not specify the version of PyTorch or other software dependencies. |
| Experiment Setup | Yes | The network is trained for 200 epochs and a batch size of 64, using the Adam optimizer (Kingma & Ba, 2017) with default parameters... To train our NNs, we run Adam (Kingma & Ba, 2017) with default parameters for 30 epochs and batch size of 64 or 128... We use Adam with a learning rate of 0.1 as optimizer, for 200 epochs and batch size of 64... For each Θ, we train for 20 epochs using the Adam optimizer with a batch size of 256, learning rate η = 0.01 for w1 and η/10 for w2, and other parameters set to their default values (Kingma & Ba, 2017). During training, we clamp the norm of each layer s weight matrix at the end of each iteration to satisfy the condition in Theorem B.2. |