reproducibilityindex.ai

Information-theoretic generalization bounds for black-box learning algorithms

Authors: Hrayr Harutyunyan, Maxim Raginsky, Greg Ver Steeg, Aram Galstyan

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compute our most efﬁcient bound on realistic classiﬁcation problems involving neural networks, and show that the bound closely follows the generalization error, even in situations when a neural network with 3M parameters is trained deterministically on 4000 examples, achieving 1% generalization error. As mentioned earlier, the expected generalization gap bound of Corollary 2 is signiﬁcantly easier to compute compared to existing information-theoretic bounds, and does not give trivial results for deterministic algorithms. To understand how well the bound does in challenging situations, we consider cases when the algorithm generalizes well despite the high complexity of the hypothesis class and relatively small number of training examples. Due to space constraints we omit some experimental details and present them in Appendix B. The code can be found at github.com/hrayrhar/f-CMI.
Researcher Affiliation	Academia	1 USC Information Sciences Institute, 2 University of Illinois Urbana-Champaign
Pseudocode	No	The paper provides mathematical theorems and proofs but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	The code can be found at github.com/hrayrhar/f-CMI.
Open Datasets	Yes	First, we consider the MNIST 4 vs 9 digit classiﬁcation task [20] using a 4-layer convolutional neural network (CNN) that has approximately 200K parameters. and Next, we move away from binary classiﬁcation and consider the CIFAR-10 classication task [19].
Dataset Splits	No	The paper mentions training and test data, but does not explicitly state the specific split percentages or sample counts for a validation set.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper does not specify software dependencies with version numbers (e.g., specific library versions for PyTorch, TensorFlow, or Python).
Experiment Setup	Yes	We train the network using for 200 epochs using the ADAM algorithm [18] with 0.001 learning rate, β1 = 0.9, and mini-batches of 128 examples. and To construct a well-generalizing algorithm, we use the Res Net-50 [16] network pretrained on the Image Net [7], and ﬁne-tune it for 40 epochs using SGD with mini-batches of size 64, 0.01 learning rate, 0.9 momentum, and standard data augmentations.