reproducibilityindex.ai

Understanding the Limitations of Variational Mutual Information Estimators

Authors: Jiaming Song, Stefano Ermon

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also empirically demonstrate that existing estimators fail to satisfy basic self-consistency properties of MI, such as data processing and additivity under independence. Based on a uniﬁed perspective of variational approaches, we develop a new estimator that focuses on variance reduction. Empirical results on standard benchmark tasks demonstrate that our proposed estimator exhibits improved biasvariance trade-offs on standard benchmark tasks.
Researcher Affiliation	Academia	Jiaming Song & Stefano Ermon Stanford University {tsong, ermon}@cs.stanford.edu
Pseudocode	No	No pseudocode or clearly labeled algorithm block was found.
Open Source Code	No	2We release our code in
Open Datasets	Yes	Next, we perform our proposed self-consistency tests on high-dimensional images (MNIST and CIFAR10)
Dataset Splits	No	The paper mentions training models for a specific number of iterations and with a batch size, but does not provide specific train/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper describes the software used (e.g., Adam optimizer) and training parameters, but does not provide specific details regarding the hardware (e.g., GPU models, CPU types) on which the experiments were run.
Software Dependencies	No	The paper mentions using specific models and optimizers like 'Adam optimizer (Kingma & Ba, 2014)' and 'invertible ﬂow models (Dinh et al., 2016)', but it does not specify software library names with their version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.0').
Experiment Setup	Yes	Architecture and training procedure For all the discriminative methods, we consider two types of architectures joint and separable. The joint architecture concatenates the inputs x, y, and then passes through a two layer MLP with 256 neurons in each layer with Re LU activations at each layer... For all the cases, we use with the Adam optimizer (Kingma & Ba, 2014) with learning rate 5 10 4 and β1 = 0.9, β2 = 0.999 and train for 20k iterations with a batch size of 64, following the setup in Poole et al. (2019).