Understanding the Limitations of Variational Mutual Information Estimators
Authors: Jiaming Song, Stefano Ermon
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also empirically demonstrate that existing estimators fail to satisfy basic self-consistency properties of MI, such as data processing and additivity under independence. Based on a unified perspective of variational approaches, we develop a new estimator that focuses on variance reduction. Empirical results on standard benchmark tasks demonstrate that our proposed estimator exhibits improved biasvariance trade-offs on standard benchmark tasks. |
| Researcher Affiliation | Academia | Jiaming Song & Stefano Ermon Stanford University {tsong, ermon}@cs.stanford.edu |
| Pseudocode | No | No pseudocode or clearly labeled algorithm block was found. |
| Open Source Code | No | 2We release our code in |
| Open Datasets | Yes | Next, we perform our proposed self-consistency tests on high-dimensional images (MNIST and CIFAR10) |
| Dataset Splits | No | The paper mentions training models for a specific number of iterations and with a batch size, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper describes the software used (e.g., Adam optimizer) and training parameters, but does not provide specific details regarding the hardware (e.g., GPU models, CPU types) on which the experiments were run. |
| Software Dependencies | No | The paper mentions using specific models and optimizers like 'Adam optimizer (Kingma & Ba, 2014)' and 'invertible flow models (Dinh et al., 2016)', but it does not specify software library names with their version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.0'). |
| Experiment Setup | Yes | Architecture and training procedure For all the discriminative methods, we consider two types of architectures joint and separable. The joint architecture concatenates the inputs x, y, and then passes through a two layer MLP with 256 neurons in each layer with Re LU activations at each layer... For all the cases, we use with the Adam optimizer (Kingma & Ba, 2014) with learning rate 5 10 4 and β1 = 0.9, β2 = 0.999 and train for 20k iterations with a batch size of 64, following the setup in Poole et al. (2019). |