Deep Variational Information Bottleneck

Authors: Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present various experimental results, comparing the behavior of standard deterministic networks to stochastic neural networks trained by optimizing the VIB objective. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.
Researcher Affiliation Industry Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy Google Research {alemi,iansf,jvdillon,kpmurphy}@google.com
Pseudocode No The paper provides mathematical derivations and descriptions of the proposed method, but it does not include any formally presented pseudocode or algorithm blocks (e.g., in a labeled figure or environment).
Open Source Code No The paper mentions that 'Carlini & Wagner (2016) shared their code with us' and refers to 'publicly available, pretrained checkpoints' of other models, but it does not explicitly state that the authors' own source code for the VIB method is publicly available or provided.
Open Datasets Yes We start with experiments on unmodified MNIST (i.e. no data augmentation). We make use of publicly available, pretrained checkpoints of Inception Resnet V2 (Szegedy et al., 2016) on Image Net (Deng et al., 2009).
Dataset Splits Yes For the MNIST experiments, a batch size of 100 was used, and the full 60,000 training and validation set was used for training, and the 10,000 test images for test results.
Hardware Specification No The paper mentions training on TensorFlow, which implies computational resources, but it does not specify any particular hardware details such as GPU models, CPU types, or memory specifications used for the experiments.
Software Dependencies No The paper states 'All of the networks for this paper were trained using Tensor Flow (Abadi et al., 2016)' and 'The Adam optimizer (Kingma & Ba, 2015) was used'. While these are software components, specific version numbers for TensorFlow or the Adam optimizer are not provided.
Experiment Setup Yes The networks were trained using Tensor Flow for 200 epochs using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0001. Full hyperparameter details can be found in Appendix A. Appendix A details: initial learning rate of 10 4, (β1 = 0.5, β2 = 0.999) and exponential decay, decaying the learning rate by a factor of 0.97 every 2 epochs. The networks were all trained for 200 epochs total. For the MNIST experiments, a batch size of 100 was used... The input images were scaled to have values between -1 and 1 before fed to the network.