Information-theoretic generalization bounds for black-box learning algorithms
Authors: Hrayr Harutyunyan, Maxim Raginsky, Greg Ver Steeg, Aram Galstyan
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compute our most efficient bound on realistic classification problems involving neural networks, and show that the bound closely follows the generalization error, even in situations when a neural network with 3M parameters is trained deterministically on 4000 examples, achieving 1% generalization error. As mentioned earlier, the expected generalization gap bound of Corollary 2 is significantly easier to compute compared to existing information-theoretic bounds, and does not give trivial results for deterministic algorithms. To understand how well the bound does in challenging situations, we consider cases when the algorithm generalizes well despite the high complexity of the hypothesis class and relatively small number of training examples. Due to space constraints we omit some experimental details and present them in Appendix B. The code can be found at github.com/hrayrhar/f-CMI. |
| Researcher Affiliation | Academia | 1 USC Information Sciences Institute, 2 University of Illinois Urbana-Champaign |
| Pseudocode | No | The paper provides mathematical theorems and proofs but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code can be found at github.com/hrayrhar/f-CMI. |
| Open Datasets | Yes | First, we consider the MNIST 4 vs 9 digit classification task [20] using a 4-layer convolutional neural network (CNN) that has approximately 200K parameters. and Next, we move away from binary classification and consider the CIFAR-10 classication task [19]. |
| Dataset Splits | No | The paper mentions training and test data, but does not explicitly state the specific split percentages or sample counts for a validation set. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers (e.g., specific library versions for PyTorch, TensorFlow, or Python). |
| Experiment Setup | Yes | We train the network using for 200 epochs using the ADAM algorithm [18] with 0.001 learning rate, β1 = 0.9, and mini-batches of 128 examples. and To construct a well-generalizing algorithm, we use the Res Net-50 [16] network pretrained on the Image Net [7], and fine-tune it for 40 epochs using SGD with mini-batches of size 64, 0.01 learning rate, 0.9 momentum, and standard data augmentations. |