reproducibilityindex.ai

PAC-Bayes Information Bottleneck

Authors: Zifeng Wang, Shao-Lun Huang, Ercan Engin Kuruoglu, Jimeng Sun, Xi Chen, Yefeng Zheng

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EXPERIMENTS In this section, we aim to verify the intepretability of the proposed notion of IIW by Eq. (15). We monitor the information trajectory when training NNs with plain cross entropy loss and SGD for the sake of activation functions ( 5.1), architecture ( 5.2), noise ratio ( 5.3), and batch size ( 5.4). We also substantiate the superiority of optimal Gibbs posterior inference based on the proposed Algorithm 2, where PIB instead of plain cross entropy is used as the objective function ( 5.5). We conclude the empirical observations in 5.6 at last. Please refer to Appendix D for general experimental setups about the used datasets and NNs. Table 1: Test performance of the proposed PIB algorithm compared with two other common regularization techniques: ℓ2-norm and dropout, on VGG-net (Simonyan & Zisserman, 2014). The 95% conﬁdence intervals are shown in parentheses. Best values are in bold.
Researcher Affiliation	Collaboration	Zifeng Wang UIUC Shao-Lun Huang Tsinghua University Ercan E. Kuruoglu Tsinghua University Jimeng Sun UIUC Xi Chen Tencent Yefeng Zheng Tencent
Pseudocode	Yes	Algorithm 1: Efﬁcient approximate information estimation of I(w; S) Algorithm 2: Optimal Gibbs posterior inference by SGLD.
Open Source Code	Yes	Demo code is at https://github.com/Ryan Wang Zf/PAC-Bayes-IB.
Open Datasets	Yes	All experiments are conducted on MNIST (Le Cun et al., 1998) or CIFAR-10 (Krizhevsky et al., 2009). We train a large VGG network (Simonyan & Zisserman, 2014) on four open datasets: CIFAR10/100 (Krizhevsky et al., 2009), STL10 (Coates et al., 2011), and SVHN (Netzer et al., 2011), as shown in Table 1
Dataset Splits	No	The paper mentions 'train acc' and 'test acc' but does not provide specific details on how dataset splits (e.g., train/validation/test percentages or counts) were defined for reproducibility. It also does not explicitly mention a 'validation' set.
Hardware Specification	Yes	We use one RTX 3070 GPU for all experiments.
Software Dependencies	No	The paper mentions using 'Py Torch' and 'Adam optimizer' but does not provide specific version numbers for these or any other software dependencies, which are necessary for reproducible descriptions.
Experiment Setup	Yes	Speciﬁcally for the Bayesian inference experiment, the batch size is picked within {8, 16, 32, 64, 128, 256, 512}; learning rate is in {1e 4, 1e 3, 1e 2, 1e 1}; weight decay of ℓ2norm is in {1e 3, 1e 4, 1e 5, 1e 6}; noise scale of SGLD is in {1e 4, 1e 6, 1e 8, 1e 10}; β of PAC-Bayes IB is in {1e 1, 1e 2, 1e 3}; and the dropout rate is ﬁxed as 0.1 for the dropout regularization.