Mutual Information Estimation via Normalizing Flows

Authors: Ivan Butakov, Aleksandr Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments with high-dimensional data are conducted to highlight the practical advantages of the proposed method. In Section 5, a series of experiments is performed to evaluate the proposed method and compare it to several other key MI estimators.
Researcher Affiliation Academia Butakov I. D. Skoltech , MIPT , Sirius , butakov.id@phystech.su, Tolmachev A. D. Skoltech, MIPT tolmachev.ad@phystech.su, Malanchuk S. V. MIPT, Skoltech malanchuk.sv@phystech.su, Neopryatnaya A. M. MIPT, Skoltech neopryatnaya.am@skoltech.ru, Frolov A. A. Skoltech al.frolov@skoltech.ru, Skolkovo Institute of Science and Technology Moscow Institute of Physics and Technology Sirius University of Science and Technology
Pseudocode Yes Algorithm 1 MI estimator evaluation
Open Source Code Yes In the following repositories, we provide Py Torch implementations of the NN-based estimators we used: https://github.com/Vaness B/pytorch-kld and https://github.com/Vaness B/pytorch-mienf.
Open Datasets Yes To evaluate our estimator, we utilize synthetic datasets with known mutual information. For the first set of experiments, we map a low-dimensional correlated Gaussian distribution to a distribution of high-dimensional images of geometric shapes (see Figure 2). For the second set of experiments, incompressible, high-dimensional non-Gaussian-based distributions are considered. We also use two additional, non-Gaussian-based families of distributions with known closed-form expressions for MI and easy sampling procedures: multivariate Student distribution [52] and smoothed uniform distribution [14]. For this example, we use the MNIST dataset of handwritten digits [53].
Dataset Splits No The paper describes sample sizes for experiments and training parameters but does not provide explicit training/validation/test dataset splits (e.g., percentages or specific counts) for the data used for evaluation.
Hardware Specification Yes Nvidia Titan RTX was used to train the models.
Software Dependencies No The paper mentions using the 'normflows package [54]' and 'Adam [55] optimizer' but does not specify explicit version numbers for these or other software dependencies.
Experiment Setup Yes For the tests described in Section 5, we use architectures listed in Table 2. For the flow models, we use the normflows package [54]. The autoencoders are trained via Adam [55] optimizer on 5 * 10^3 images with a batch size 5 * 10^3, a learning rate 10^-3 and MAE loss for 2 * 10^3 epochs. The MINE/NWJ/Nishiyama critic network is trained via the Adam optimizer on 5 * 10^3 pairs of images with a batch size 512, a learning rate 10^-3 for 5 * 10^3 epochs. The GLOW normalizing flow is trained via the Adam optimizer on 10 * 10^3 images with a batch size 1024, a learning rate decaying from 5 * 10^-4 to 1 * 10^-5 for 2 * 10^3 epochs.