Mutual Information Estimation via Normalizing Flows
Authors: Ivan Butakov, Aleksandr Tolmachev, Sofia Malanchuk, Anna Neopryatnaya, Alexey Frolov
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments with high-dimensional data are conducted to highlight the practical advantages of the proposed method. In Section 5, a series of experiments is performed to evaluate the proposed method and compare it to several other key MI estimators. |
| Researcher Affiliation | Academia | Butakov I. D. Skoltech , MIPT , Sirius , butakov.id@phystech.su, Tolmachev A. D. Skoltech, MIPT tolmachev.ad@phystech.su, Malanchuk S. V. MIPT, Skoltech malanchuk.sv@phystech.su, Neopryatnaya A. M. MIPT, Skoltech neopryatnaya.am@skoltech.ru, Frolov A. A. Skoltech al.frolov@skoltech.ru, Skolkovo Institute of Science and Technology Moscow Institute of Physics and Technology Sirius University of Science and Technology |
| Pseudocode | Yes | Algorithm 1 MI estimator evaluation |
| Open Source Code | Yes | In the following repositories, we provide Py Torch implementations of the NN-based estimators we used: https://github.com/Vaness B/pytorch-kld and https://github.com/Vaness B/pytorch-mienf. |
| Open Datasets | Yes | To evaluate our estimator, we utilize synthetic datasets with known mutual information. For the first set of experiments, we map a low-dimensional correlated Gaussian distribution to a distribution of high-dimensional images of geometric shapes (see Figure 2). For the second set of experiments, incompressible, high-dimensional non-Gaussian-based distributions are considered. We also use two additional, non-Gaussian-based families of distributions with known closed-form expressions for MI and easy sampling procedures: multivariate Student distribution [52] and smoothed uniform distribution [14]. For this example, we use the MNIST dataset of handwritten digits [53]. |
| Dataset Splits | No | The paper describes sample sizes for experiments and training parameters but does not provide explicit training/validation/test dataset splits (e.g., percentages or specific counts) for the data used for evaluation. |
| Hardware Specification | Yes | Nvidia Titan RTX was used to train the models. |
| Software Dependencies | No | The paper mentions using the 'normflows package [54]' and 'Adam [55] optimizer' but does not specify explicit version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For the tests described in Section 5, we use architectures listed in Table 2. For the flow models, we use the normflows package [54]. The autoencoders are trained via Adam [55] optimizer on 5 * 10^3 images with a batch size 5 * 10^3, a learning rate 10^-3 and MAE loss for 2 * 10^3 epochs. The MINE/NWJ/Nishiyama critic network is trained via the Adam optimizer on 5 * 10^3 pairs of images with a batch size 512, a learning rate 10^-3 for 5 * 10^3 epochs. The GLOW normalizing flow is trained via the Adam optimizer on 10 * 10^3 images with a batch size 1024, a learning rate decaying from 5 * 10^-4 to 1 * 10^-5 for 2 * 10^3 epochs. |