Deep Variational Information Bottleneck
Authors: Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present various experimental results, comparing the behavior of standard deterministic networks to stochastic neural networks trained by optimizing the VIB objective. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack. |
| Researcher Affiliation | Industry | Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy Google Research {alemi,iansf,jvdillon,kpmurphy}@google.com |
| Pseudocode | No | The paper provides mathematical derivations and descriptions of the proposed method, but it does not include any formally presented pseudocode or algorithm blocks (e.g., in a labeled figure or environment). |
| Open Source Code | No | The paper mentions that 'Carlini & Wagner (2016) shared their code with us' and refers to 'publicly available, pretrained checkpoints' of other models, but it does not explicitly state that the authors' own source code for the VIB method is publicly available or provided. |
| Open Datasets | Yes | We start with experiments on unmodified MNIST (i.e. no data augmentation). We make use of publicly available, pretrained checkpoints of Inception Resnet V2 (Szegedy et al., 2016) on Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | For the MNIST experiments, a batch size of 100 was used, and the full 60,000 training and validation set was used for training, and the 10,000 test images for test results. |
| Hardware Specification | No | The paper mentions training on TensorFlow, which implies computational resources, but it does not specify any particular hardware details such as GPU models, CPU types, or memory specifications used for the experiments. |
| Software Dependencies | No | The paper states 'All of the networks for this paper were trained using Tensor Flow (Abadi et al., 2016)' and 'The Adam optimizer (Kingma & Ba, 2015) was used'. While these are software components, specific version numbers for TensorFlow or the Adam optimizer are not provided. |
| Experiment Setup | Yes | The networks were trained using Tensor Flow for 200 epochs using the Adam optimizer (Kingma & Ba, 2015) with a learning rate of 0.0001. Full hyperparameter details can be found in Appendix A. Appendix A details: initial learning rate of 10 4, (β1 = 0.5, β2 = 0.999) and exponential decay, decaying the learning rate by a factor of 0.97 every 2 epochs. The networks were all trained for 200 epochs total. For the MNIST experiments, a batch size of 100 was used... The input images were scaled to have values between -1 and 1 before fed to the network. |