Rethinking Information-theoretic Generalization: Loss Entropy Induced PAC Bounds

Authors: Yuxin Dong, Tieliang Gong, Hong Chen, Shujian Yu, Chen Li

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive numerical studies indicate strong correlations between the generalization error and the induced loss entropy, showing that the presented bounds adeptly capture the patterns of the true generalization gap under various learning scenarios. ... In this section, we conduct empirical comparisons between the generalization bounds established in this paper and the previous high-probability bounds proposed in (Kawaguchi et al., 2023; Hellstr om & Durisi, 2022). Our evaluation involves two sets of experiments: Firstly, we investigate data-independent bounds using synthetic 2D Gaussian datasets by employing a simple MLP network as the classifier, which follows the same learning settings as (Kawaguchi et al., 2023). Secondly, we evaluate data-dependent bounds by training more complex neural networks on real-world image classification datasets (4-layer CNN on MNIST (Le Cun & Cortes, 2010) and Res Net-50 (He et al., 2016) on CIFAR10 (Krizhevsky et al., 2009)).
Researcher Affiliation Academia Yuxin Dong1, Tieliang Gong1 , Hong Chen2, Shujian Yu3 & Chen Li1 1Xi an Jiaotong University, 2Huazhong Agriculture University, 3Vrije Universiteit Amsterdam
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. It provides theoretical proofs and descriptions of methods.
Open Source Code Yes Reproducibility Statement. To ensure reproducibility, we include complete proofs of our theoretical results in Appendix B, C and D, detailed explanations of our experimental settings in Appendix F, and source codes at https://github.com/Yuxin-Dong/Loss-Entropy.
Open Datasets Yes Secondly, we evaluate data-dependent bounds by training more complex neural networks on real-world image classification datasets (4-layer CNN on MNIST (Le Cun & Cortes, 2010) and Res Net-50 (He et al., 2016) on CIFAR10 (Krizhevsky et al., 2009)).
Dataset Splits Yes The training dataset S = {Zi}n i=1 Zn is constructed by i.i.d sampling from the unknown data-generating distribution ยต. ... Specifically, we consider training a 4-layer CNN on binarized MNIST data, which is restricted to comprise only digits 4 and 9. Additionally, we engage in fine-tuning a pre-trained Res Net50 model on the CIFAR10 dataset. ... The value of k1, k2 and experimental settings are kept the same as Harutyunyan et al. (2021)2.
Hardware Specification Yes In this paper, deep learning models are trained with an Intel Xeon CPU (2.10GHz, 48 cores), 256GB memory, and 4 Nvidia Tesla V100 GPUs (32GB).
Software Dependencies No The paper mentions "Python", "Scipy package", and refers to an implementation from "Kawaguchi et al. (2023)1" but does not specify version numbers for Python, any deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries beyond Scipy without a version.
Experiment Setup Yes 216 models in total are trained according to different combinations of the following options: 4 MLP encoders ([256, 256, 128, 128], [128, 128, 64, 64], [64, 64, 32, 32], [32, 32, 16, 16]), 3 weight-decay rates (0, 0.01, 0.1), 3 dataset draws and 3 random seeds. The models are designed under the variational setting, where the encoder is trained to characterize a conditional distribution for the representation given the input via deterministic means and standard deviations. The parameterization trick is used for optimization. Models are trained for 300 epochs with a learning rate of 0.01. The conditional and unconditional loss entropies are estimated using a simple Gaussian kernel density estimator, where the kernel width is automatically selected by the well-known rule-of-thumb criterion.