Mutual Information Estimation via $f$-Divergence and Data Derangements
Authors: Nunzio Alexandro Letizia, Nicola Novello, Andrea M Tonello
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The comparison with state-of-the-art neural estimators, through extensive experimentation within established reference scenarios, shows that our approach offers higher accuracy and lower complexity. 38th Conference on Neural Information Processing Systems (Neur IPS 2024). 6 Experimental Results In this section, we firstly describe the architectures of the proposed estimators. Then, we outline the data used to estimate the MI, comment on the performance of the discussed estimators in different scenarios, also analyzing their computational complexity. Finally, we present the outcomes of the self-consistency tests [20] over image datasets. |
| Researcher Affiliation | Academia | Nunzio A. Letizia Nicola Novello Andrea M. Tonello University of Klagenfurt {nunzio.letizia,nicola.novello,andrea.tonello}@aau.at |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. Methods are described textually or mathematically. |
| Open Source Code | Yes | Our implementation can be found at https://github.com/tonellolab/f DIME |
| Open Datasets | Yes | We use the images collected in MNIST [33] and Fashion MNIST [34] data sets. In the first setting (called Gaussian), a multidimensional Gaussian distribution is sampled to obtain x and n samples, independently. In the second setting (referred to as cubic), the nonlinear transformation y 7 y3 is applied to the Gaussian samples. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, but rather describes how data is generated (e.g., Gaussian, cubic) or refers to standard datasets like MNIST/Fashion MNIST where splits are common knowledge but not explicitly stated here. |
| Hardware Specification | Yes | A fundamental characteristic of each algorithm is the computational time. The computational time analysis is developed on a server with CPU AMD Ryzen Threadripper 3960X 24-Core Processor and GPU MSI Ge Force RTX 3090 Gaming X Trio 24G, 24GB GDDR6X . |
| Software Dependencies | Yes | We implemented a Pytorch [31] version of the code produced by the authors of [24] 3, to unify NJEE with all the other MI estimators. Each neural estimator is trained using Adam optimizer [32], with learning rate 5 10 4, β1 = 0.9, β2 = 0.999. |
| Experiment Setup | Yes | Each neural network is trained for 4k iterations for each stair step, with a batch size of 64 samples (N = 64). Each neural estimator is trained using Adam optimizer [32], with learning rate 5 10 4, β1 = 0.9, β2 = 0.999. The batch size is initially set to N = 64. |