Mutual Information Neural Estimation

Authors: Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Empirical comparisons Before diving into applications, we perform some simple empirical evaluation and comparisons of MINE. The objective is to show that MINE is effectively able to estimate mutual information and account for non-linear dependence.
Researcher Affiliation Academia 1Montr eal Institute for Learning Algorithms (MILA), University of Montr eal 2Department of Mathematics and Statistics, Mc Gill University 3Canadian Institute for Advanced Research (CIFAR) 4The Institute for Data Valorization (IVADO).
Pseudocode Yes Algorithm 1 MINE
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available.
Open Datasets Yes Experiment: Stacked MNIST Following Che et al. (2016); Metz et al. (2017); Srivastava et al. (2017); Lin et al. (2017), we quantitatively assess MINE s ability to diminish mode dropping on the stacked MNIST dataset which is constructed by stacking three randomly sampled MNIST digits. We train MINE on datasets of increasing order of complexity: a toy dataset composed of 25-Gaussians, MNIST (Le Cun, 1998), and the Celeb A dataset (Liu et al., 2015).
Dataset Splits No The paper mentions using datasets like MNIST and CelebA but does not provide specific train/validation/test split percentages, sample counts, or citations to predefined splits for their experiments. While it mentions a pre-trained classifier on 26,000 samples, this doesn't detail their own model's data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used for conducting the experiments.
Software Dependencies No The paper does not provide specific software dependency details, such as library names with version numbers, used in the experiments.
Experiment Setup Yes We demonstrate using Eqn. 17 on the spiral and the 25-Gaussians datasets, comparing two models, one with β = 0 (which corresponds to the orthodox GAN as in Goodfellow et al. (2014)) and one with β = 1.0, which corresponds to mutual information maximization. Since the mutual information is theoretically unbounded, we use adaptive gradient clipping (see the Supplementary Material) to ensure that the generator receives learning signals similar in magnitude from the discriminator and the statistics network.