Mutual Information Neural Estimation
Authors: Mohamed Ishmael Belghazi, Aristide Baratin, Sai Rajeshwar, Sherjil Ozair, Yoshua Bengio, Aaron Courville, Devon Hjelm
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Empirical comparisons Before diving into applications, we perform some simple empirical evaluation and comparisons of MINE. The objective is to show that MINE is effectively able to estimate mutual information and account for non-linear dependence. |
| Researcher Affiliation | Academia | 1Montr eal Institute for Learning Algorithms (MILA), University of Montr eal 2Department of Mathematics and Statistics, Mc Gill University 3Canadian Institute for Advanced Research (CIFAR) 4The Institute for Data Valorization (IVADO). |
| Pseudocode | Yes | Algorithm 1 MINE |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | Experiment: Stacked MNIST Following Che et al. (2016); Metz et al. (2017); Srivastava et al. (2017); Lin et al. (2017), we quantitatively assess MINE s ability to diminish mode dropping on the stacked MNIST dataset which is constructed by stacking three randomly sampled MNIST digits. We train MINE on datasets of increasing order of complexity: a toy dataset composed of 25-Gaussians, MNIST (Le Cun, 1998), and the Celeb A dataset (Liu et al., 2015). |
| Dataset Splits | No | The paper mentions using datasets like MNIST and CelebA but does not provide specific train/validation/test split percentages, sample counts, or citations to predefined splits for their experiments. While it mentions a pre-trained classifier on 26,000 samples, this doesn't detail their own model's data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory specifications) used for conducting the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details, such as library names with version numbers, used in the experiments. |
| Experiment Setup | Yes | We demonstrate using Eqn. 17 on the spiral and the 25-Gaussians datasets, comparing two models, one with β = 0 (which corresponds to the orthodox GAN as in Goodfellow et al. (2014)) and one with β = 1.0, which corresponds to mutual information maximization. Since the mutual information is theoretically unbounded, we use adaptive gradient clipping (see the Supplementary Material) to ensure that the generator receives learning signals similar in magnitude from the discriminator and the statistics network. |