Tighter Information-Theoretic Generalization Bounds from Supersamples
Authors: Ziqiao Wang, Yongyi Mao
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically compare some CMI and MI bounds discussed in our paper. Our first experiment is based on a synthetic Gaussian dataset, where a simple linear classifier (with a softmax output layer) will be trained. The second experiment follows the same deep learning scenario setting with (Harutyunyan et al., 2021; Hellström & Durisi, 2022a), where we will train a 4-layer CNN on MNIST (Le Cun et al., 2010) and fine-tune a Res Net-50 (He et al., 2016) (pretrained on Image Net (Deng et al., 2009)) on CIFAR10 (Krizhevsky, 2009). |
| Researcher Affiliation | Academia | Ziqiao Wang 1 Yongyi Mao 1 1Department of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada. Correspondence to: Ziqiao Wang <zwang286@uottawa.ca>, Yongyi Mao <ymao@uottawa.ca>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | Notice that our code is primarily the same as the code provided by Hellström & Durisi (2022a), which is originally based on the code in https://github.com/hrayrhar/f-CMI. The paper does not explicitly state that the authors are releasing their code for the work described in this paper. |
| Open Datasets | Yes | Our first experiment is based on a synthetic Gaussian dataset... train a 4-layer CNN on MNIST (Le Cun et al., 2010) and fine-tune a Res Net-50 (He et al., 2016) (pretrained on Image Net (Deng et al., 2009)) on CIFAR10 (Krizhevsky, 2009). |
| Dataset Splits | No | The paper mentions using training data and early stopping, but does not explicitly provide training/validation/test dataset splits (exact percentages, sample counts, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | Yes | All these experiments are conducted using NVIDIA Tesla V100 GPUs with 32 GB of memory. |
| Software Dependencies | No | The paper mentions using the 'scikit-learn' package and optimizers like 'Adam' and 'SGD', but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Specifically, we choose the dimension of data X to be 5 and we create different class of points normally distributed (with the standard deviation being 1) about vertices of an 5-dimensional hypercube, where its sides of length can be manually controlled. In addition, we utilize full-batch gradient descent with a fixed learning rate of 0.01 to train the linear classifier. We perform training for a total of 500 epochs, and we employ early stopping when the training error reaches a sufficiently low threshold (e.g., < 0.5%). ... For the CNN on the binary MNIST dataset, we set k1 = 5 and k2 = 30. The 4-layer CNN model is trained using the Adam optimizer with a learning rate of 0.001 and a momentum coefficient of β1 = 0.9. The training process spans 200 epochs, with a batch size of 128. For Res Net-50 on CIFAR10, we set k1 = 2 and k2 = 40. The Res Net model is trained using stochastic gradient descent (SGD) with a learning rate of 0.01 and a momentum coefficient of 0.9 for a total of 40 epochs. The batch size for this experiment is set to 64. In the SGLD experiment, we once again train a 4-layer CNN on the binary MNIST dataset. The batch size is set to 100, and the training lasts for 40 epochs. The initial learning rate is 0.01 and decays by a factor of 0.9 after every 100 iterations. Let t be the iteration index, the inverse temperature of SGLD is given by min{4000, max{100, 10et/100}}. |