Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Properties and Estimation of Pointwise Mutual Information Profiles

Authors: Paweł Czyż, Frederic Grabowski, Julia E Vogt, Niko Beerenwinkel, Alexander Marx

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we analytically describe the profiles of multivariate normal distributions and show that for an expressive family of distributions, termed Bend and Mix Models, the profile can be accurately estimated using Monte Carlo methods. We then show how Bend and Mix Models can be used to study the limitations of existing mutual information estimators, investigate the behavior of neural critics used in variational estimators, and understand the effect of experimental outliers on mutual information estimation. Finally, we show how Bend and Mix Models can be used to obtain model-based Bayesian estimates of mutual information, suitable for problems with available domain expertise in which uncertainty quantification is necessary. The accompanying code is available at https://github.com/cbg-ethz/bmi. [...] Through a series of numerical experiments we demonstrate the usefulness of BMMs for creating non-trivial benchmark tasks; in particular, we discuss robustness of mutual information estimators to inlier and outlier noise. Additionally, BMMs allow us to investigate the properties of PMI profiles directly. [...] Table 1: Benchmark results. New benchmark problems are marked in green. Best-perfoming estimator in each row has been marked with bold font.
Researcher Affiliation	Academia	Paweł Czyż EMAIL ETH AI Center and Department of Biosystems Science and Engineering ETH Zurich Zurich, Switzerland Frederic Grabowski EMAIL Institute of Fundamental Technological Research Warsaw, Poland Julia E. Vogt EMAIL Department of Computer Science ETH Zurich SIB Swiss Institute for Bioinformatics Zurich, Switzerland Niko Beerenwinkel EMAIL Department of Biosystems Science and Engineering ETH Zurich SIB Swiss Institute for Bioinformatics Basel, Switzerland Alexander Marx EMAIL Research Center Trustworthy Data Science and Security of the University Alliance Ruhr Department of Statistics TU Dortmund University Dortmund, Germany
Pseudocode	No	The paper contains formal definitions, theorems, propositions, and proofs for the theoretical framework but does not include any explicitly labeled pseudocode or algorithm blocks. The methods are described textually or through mathematical expressions.
Open Source Code	Yes	The accompanying code is available at https://github.com/cbg-ethz/bmi.
Open Datasets	No	The paper describes generating synthetic data for its experiments and benchmarks, rather than using pre-existing open datasets. For example: "To illustrate how BMMs can be used to create expressive benchmark tasks, we implemented a benchmark of 26 continuous distributions in Tensor Flow Probability on JAX (Dillon et al., 2017; Bradbury et al., 2018) (Appendix C.2)." and "For each BMM we sampled ten data sets with N = 5 000 points..." The code to generate the data is provided, but not direct access to pre-existing open datasets or the generated datasets themselves.
Dataset Splits	Yes	We simulated N = 5 000 data points from a mixture of four bivariate normal distributions (Fig. 4) with I(X; Y ) = 0.36 and fitted the neural critics (see Appendix C) to half of the data, retaining the latter half as the test set, on which the final estimates were obtained, yielding INWJ = 0.33, IDV = 0.32, INCE = 0.35. [...] For each distribution we sampled N data points once, ran a single Markov chain with 2000 warm-up steps and collected 800 samples.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running its experiments. It mentions 'on a standard laptop' in the context of computational cost estimation but not for the main experimental setup.
Software Dependencies	No	The paper mentions several software tools like 'Tensor Flow Probability on JAX', 'Num Pyro', and 'Snakemake workflows', along with citations to the papers introducing them. However, it does not provide specific version numbers for any of these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We decided to use the histogram-based estimator (Cellucci et al., 2005; Darbellay & Vajda, 1999) with a fixed number of 10 bins per dimension and the popular KSG estimator (Kraskov et al., 2004) with k = 10 neighbors. Canonical correlation analysis (Kay, 1992; Brillinger (2004)) does not have any hyperparameters. Finally, we variational estimators with the neural critic being a ReLU network of variant M (with 16 and 8 hidden neurons). As a preprocessing strategy, we followed Czyż et al. (2023, Appendix E.3) and transformed all samples to have zero empirical mean and unit variance along each dimension. [...] For each sample we ran a single Markov chain with 1,000 warm-up steps and 1,000 collected samples.