Understanding the Limitations of Variational Mutual Information Estimators

Authors: Jiaming Song, Stefano Ermon

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also empirically demonstrate that existing estimators fail to satisfy basic self-consistency properties of MI, such as data processing and additivity under independence. Based on a unified perspective of variational approaches, we develop a new estimator that focuses on variance reduction. Empirical results on standard benchmark tasks demonstrate that our proposed estimator exhibits improved biasvariance trade-offs on standard benchmark tasks.
Researcher Affiliation Academia Jiaming Song & Stefano Ermon Stanford University {tsong, ermon}@cs.stanford.edu
Pseudocode No No pseudocode or clearly labeled algorithm block was found.
Open Source Code No 2We release our code in
Open Datasets Yes Next, we perform our proposed self-consistency tests on high-dimensional images (MNIST and CIFAR10)
Dataset Splits No The paper mentions training models for a specific number of iterations and with a batch size, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper describes the software used (e.g., Adam optimizer) and training parameters, but does not provide specific details regarding the hardware (e.g., GPU models, CPU types) on which the experiments were run.
Software Dependencies No The paper mentions using specific models and optimizers like 'Adam optimizer (Kingma & Ba, 2014)' and 'invertible flow models (Dinh et al., 2016)', but it does not specify software library names with their version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.0').
Experiment Setup Yes Architecture and training procedure For all the discriminative methods, we consider two types of architectures joint and separable. The joint architecture concatenates the inputs x, y, and then passes through a two layer MLP with 256 neurons in each layer with Re LU activations at each layer... For all the cases, we use with the Adam optimizer (Kingma & Ba, 2014) with learning rate 5 10 4 and β1 = 0.9, β2 = 0.999 and train for 20k iterations with a batch size of 64, following the setup in Poole et al. (2019).