reproducibilityindex.ai

Slicing Mutual Information Generalization Bounds for Neural Networks

Authors: Kimia Nadjahi, Kristjan Greenewald, Rickard Brüel Gabrielsson, Justin Solomon

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically validate our results and achieve the computation of non-vacuous information-theoretic generalization bounds for neural networks, a task that was previously out of reach.
Researcher Affiliation	Collaboration	1MIT 2MIT-IBM Watson AI Lab; IBM Research.
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	We provide the code to reproduce the experiments2. 2Code is available here: https://github.com/ kimiandj/slicing_mi_generalization.
Open Datasets	Yes	We train fully-connected NNs to classify MNIST and CIFAR-10 datasets... We classify the Iris dataset (Fisher, 1936).
Dataset Splits	No	The paper mentions 'a random subset of MNIST with n = 1000 samples' for training and 'a test dataset of 10 000 samples' for evaluation, as well as 'We compute the test error on 20n/80 observations.' for logistic regression, indicating train/test splits. However, it does not explicitly define a separate validation split or its size/percentage.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models) used for running the experiments.
Software Dependencies	No	The paper mentions 'Adam optimizer (Kingma & Ba, 2017) with default parameters (on PyTorch)' but does not specify the version of PyTorch or other software dependencies.
Experiment Setup	Yes	The network is trained for 200 epochs and a batch size of 64, using the Adam optimizer (Kingma & Ba, 2017) with default parameters... To train our NNs, we run Adam (Kingma & Ba, 2017) with default parameters for 30 epochs and batch size of 64 or 128... We use Adam with a learning rate of 0.1 as optimizer, for 200 epochs and batch size of 64... For each Θ, we train for 20 epochs using the Adam optimizer with a batch size of 256, learning rate η = 0.01 for w1 and η/10 for w2, and other parameters set to their default values (Kingma & Ba, 2017). During training, we clamp the norm of each layer s weight matrix at the end of each iteration to satisfy the condition in Theorem B.2.