Beyond Normal: On the Evaluation of Mutual Information Estimators

Authors: Paweł Czyż, Frederic Grabowski, Julia Vogt, Niko Beerenwinkel, Alexander Marx

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we show how to construct a diverse family of distributions with known ground-truth mutual information and propose a language-independent benchmarking platform for mutual information estimators. We discuss the general applicability and limitations of classical and neural estimators in settings involving high dimensions, sparse interactions, long-tailed distributions, and high mutual information. Finally, we provide guidelines for practitioners on how to select appropriate estimator adapted to the difficulty of problem considered and issues one needs to consider when applying an estimator to a new data set. Our Contributions In this work we show a method of developing expressive distributions with known ground-truth mutual information (Sec. 2), propose forty benchmark tasks and systematically study the properties of commonly used estimators, including representatives based on kernel or histogram density estimation, k NN estimation, and neural network-based estimators (Sec. 3). We address selected difficulties one can encounter when estimating mutual information (Sec. 4), such as sparsity of interactions, long-tailed distributions, invariance, and high mutual information. Finally, we provide recommendations for practitioners on how to choose a suitable estimator for particular problems (Sec. 6). Figure 2: Mean MI estimates of nine estimators over n = 10 samples with N = 10 000 points each against the ground-truth value on all benchmark tasks grouped by category. Color indicates relative negative bias (blue) and positive bias (red). Blank entries indicate that an estimator experienced numerical instabilities.
Researcher Affiliation Academia 1Department of Biosystems Science and Engineering, ETH Zurich 2ETH AI Center, ETH Zurich 3Institute of Fundamental Technological Research, Polish Academy of Sciences 4Department of Computer Science, ETH Zurich 5SIB Swiss Institute of Bioinformatics
Pseudocode No The paper describes methods and procedures in text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Accompanying code is available at http://github.com/cbg-ethz/bmi.
Open Datasets No The paper uses synthetic data generated from multivariate normal and Student distributions, not named, publicly available datasets from common repositories like CIFAR-10 or MNIST. Therefore, it does not provide access information for a pre-existing public dataset.
Dataset Splits No The paper states: "We split the data set into two equal-sized parts (training and test) and optimized the statistics network on the training data set using the Adam optimiser with initial learning rate set to 0.1. We used batch size of 256 and ran each algorithm for up to 10 000 training steps with early stopping (checked every 250 training steps)." A distinct validation split is not explicitly mentioned with specific percentages or counts.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies Yes We used the implementation provided in the scikit-learn [Pedregosa et al., 2011] Python package (version 1.2.2). We set the latent space dimension to the smaller of the dimensions of the considered random vectors X and Y. ... implemented in Transfer Entropy library (version 1.10.0).
Experiment Setup Yes We split the data set into two equal-sized parts (training and test) and optimized the statistics network on the training data set using the Adam optimiser with initial learning rate set to 0.1. We used batch size of 256 and ran each algorithm for up to 10 000 training steps with early stopping (checked every 250 training steps).