Recognizable Information Bottleneck

Authors: Yilin Lyu, Xin Liu, Mingyang Song, Xinyue Wang, Yaxin Peng, Tieyong Zeng, Liping Jing

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several commonly used datasets demonstrate the effectiveness of the proposed method in regularizing the model and estimating the generalization gap.
Researcher Affiliation Academia 1Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University 2Department of Mathematics, School of Science, Shanghai University 3The Chinese University of Hong Kong {yilinlyu, xin.liu, mingyang.song, xinyuewang, lpjing}@bjtu.edu.cn, yaxin.peng@shu.edu.cn, zeng@math.cuhk.edu.hk
Pseudocode Yes Algorithm 1 Optimization of Recognizable Information Bottleneck (RIB)
Open Source Code Yes Code is available at https://github.com/lvyilin/Recog IB.
Open Datasets Yes Our experiments mainly conduct on three widely-used datasets: Fashion-MNIST [Xiao et al., 2017], SVHN [Netzer et al., 2011] and CIFAR10 [Krizhevsky and Hinton, 2009]. We also give the results on MNIST and STL10 [Coates et al., 2011] in Appendix C.
Dataset Splits No No explicit training/validation/test dataset splits are provided within the paper. It mentions using a validation set as a ghost set ('Unless otherwise stated, the validation set is used as the ghost set.') and references an external paper for the data setting, but doesn't specify the splits in detail.
Hardware Specification Yes All the experiments are implemented with PyTorch and performed on eight NVIDIA RTX A4000 GPUs.
Software Dependencies No The paper states 'All the experiments are implemented with PyTorch' but does not specify the version number of PyTorch or any other software dependencies.
Experiment Setup Yes We use a DNN model composed of a 4-layer CNN (128-128-256-1024) and a 2-layer MLP (1024-512) as the encoder, and use a 4-layer MLP (1024-1024-1) as the recognizability critic. We train the learning model using Adam optimizer [Kingma and Ba, 2015] with betas of (0.9, 0.999) and train the recognizability critic using SGD with momentum of 0.9 as it is more stable in practice. All learning rates are set to 0.001, and the models are trained 100 epochs using the cosine annealing learning rate scheme with a batch size of 128. The trade-off parameter β is selected from {10 1, 100, 101, 102} according to the desired regularization strength as discussed later.