PAC-Bayes Information Bottleneck

Authors: Zifeng Wang, Shao-Lun Huang, Ercan Engin Kuruoglu, Jimeng Sun, Xi Chen, Yefeng Zheng

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 EXPERIMENTS In this section, we aim to verify the intepretability of the proposed notion of IIW by Eq. (15). We monitor the information trajectory when training NNs with plain cross entropy loss and SGD for the sake of activation functions ( 5.1), architecture ( 5.2), noise ratio ( 5.3), and batch size ( 5.4). We also substantiate the superiority of optimal Gibbs posterior inference based on the proposed Algorithm 2, where PIB instead of plain cross entropy is used as the objective function ( 5.5). We conclude the empirical observations in 5.6 at last. Please refer to Appendix D for general experimental setups about the used datasets and NNs. Table 1: Test performance of the proposed PIB algorithm compared with two other common regularization techniques: ℓ2-norm and dropout, on VGG-net (Simonyan & Zisserman, 2014). The 95% confidence intervals are shown in parentheses. Best values are in bold.
Researcher Affiliation Collaboration Zifeng Wang UIUC Shao-Lun Huang Tsinghua University Ercan E. Kuruoglu Tsinghua University Jimeng Sun UIUC Xi Chen Tencent Yefeng Zheng Tencent
Pseudocode Yes Algorithm 1: Efficient approximate information estimation of I(w; S) Algorithm 2: Optimal Gibbs posterior inference by SGLD.
Open Source Code Yes Demo code is at https://github.com/Ryan Wang Zf/PAC-Bayes-IB.
Open Datasets Yes All experiments are conducted on MNIST (Le Cun et al., 1998) or CIFAR-10 (Krizhevsky et al., 2009). We train a large VGG network (Simonyan & Zisserman, 2014) on four open datasets: CIFAR10/100 (Krizhevsky et al., 2009), STL10 (Coates et al., 2011), and SVHN (Netzer et al., 2011), as shown in Table 1
Dataset Splits No The paper mentions 'train acc' and 'test acc' but does not provide specific details on how dataset splits (e.g., train/validation/test percentages or counts) were defined for reproducibility. It also does not explicitly mention a 'validation' set.
Hardware Specification Yes We use one RTX 3070 GPU for all experiments.
Software Dependencies No The paper mentions using 'Py Torch' and 'Adam optimizer' but does not provide specific version numbers for these or any other software dependencies, which are necessary for reproducible descriptions.
Experiment Setup Yes Specifically for the Bayesian inference experiment, the batch size is picked within {8, 16, 32, 64, 128, 256, 512}; learning rate is in {1e 4, 1e 3, 1e 2, 1e 1}; weight decay of ℓ2norm is in {1e 3, 1e 4, 1e 5, 1e 6}; noise scale of SGLD is in {1e 4, 1e 6, 1e 8, 1e 10}; β of PAC-Bayes IB is in {1e 1, 1e 2, 1e 3}; and the dropout rate is fixed as 0.1 for the dropout regularization.