Rethinking Information-theoretic Generalization: Loss Entropy Induced PAC Bounds
Authors: Yuxin Dong, Tieliang Gong, Hong Chen, Shujian Yu, Chen Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive numerical studies indicate strong correlations between the generalization error and the induced loss entropy, showing that the presented bounds adeptly capture the patterns of the true generalization gap under various learning scenarios. ... In this section, we conduct empirical comparisons between the generalization bounds established in this paper and the previous high-probability bounds proposed in (Kawaguchi et al., 2023; Hellstr om & Durisi, 2022). Our evaluation involves two sets of experiments: Firstly, we investigate data-independent bounds using synthetic 2D Gaussian datasets by employing a simple MLP network as the classifier, which follows the same learning settings as (Kawaguchi et al., 2023). Secondly, we evaluate data-dependent bounds by training more complex neural networks on real-world image classification datasets (4-layer CNN on MNIST (Le Cun & Cortes, 2010) and Res Net-50 (He et al., 2016) on CIFAR10 (Krizhevsky et al., 2009)). |
| Researcher Affiliation | Academia | Yuxin Dong1, Tieliang Gong1 , Hong Chen2, Shujian Yu3 & Chen Li1 1Xi an Jiaotong University, 2Huazhong Agriculture University, 3Vrije Universiteit Amsterdam |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides theoretical proofs and descriptions of methods. |
| Open Source Code | Yes | Reproducibility Statement. To ensure reproducibility, we include complete proofs of our theoretical results in Appendix B, C and D, detailed explanations of our experimental settings in Appendix F, and source codes at https://github.com/Yuxin-Dong/Loss-Entropy. |
| Open Datasets | Yes | Secondly, we evaluate data-dependent bounds by training more complex neural networks on real-world image classification datasets (4-layer CNN on MNIST (Le Cun & Cortes, 2010) and Res Net-50 (He et al., 2016) on CIFAR10 (Krizhevsky et al., 2009)). |
| Dataset Splits | Yes | The training dataset S = {Zi}n i=1 Zn is constructed by i.i.d sampling from the unknown data-generating distribution ยต. ... Specifically, we consider training a 4-layer CNN on binarized MNIST data, which is restricted to comprise only digits 4 and 9. Additionally, we engage in fine-tuning a pre-trained Res Net50 model on the CIFAR10 dataset. ... The value of k1, k2 and experimental settings are kept the same as Harutyunyan et al. (2021)2. |
| Hardware Specification | Yes | In this paper, deep learning models are trained with an Intel Xeon CPU (2.10GHz, 48 cores), 256GB memory, and 4 Nvidia Tesla V100 GPUs (32GB). |
| Software Dependencies | No | The paper mentions "Python", "Scipy package", and refers to an implementation from "Kawaguchi et al. (2023)1" but does not specify version numbers for Python, any deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries beyond Scipy without a version. |
| Experiment Setup | Yes | 216 models in total are trained according to different combinations of the following options: 4 MLP encoders ([256, 256, 128, 128], [128, 128, 64, 64], [64, 64, 32, 32], [32, 32, 16, 16]), 3 weight-decay rates (0, 0.01, 0.1), 3 dataset draws and 3 random seeds. The models are designed under the variational setting, where the encoder is trained to characterize a conditional distribution for the representation given the input via deterministic means and standard deviations. The parameterization trick is used for optimization. Models are trained for 300 epochs with a learning rate of 0.01. The conditional and unconditional loss entropies are estimated using a simple Gaussian kernel density estimator, where the kernel width is automatically selected by the well-known rule-of-thumb criterion. |