Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Connecting Jensen–Shannon and Kullback–Leibler Divergences: A New Bound for Representation Learning

Authors: Reuben Dorent, Polina Golland, William (Sandy) Wells

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our lower bound is tight when applied to MI estimation. We compared our lower bound to state-of-the-art neural estimators of variational lower bound across a range of established reference scenarios. Our lower bound estimator consistently provides a stable, low-variance estimate of a tight lower bound on MI. We also demonstrate its practical usefulness in the context of the Information Bottleneck framework.
Researcher Affiliation Academia Reuben Dorent Inria EMAIL Polina Golland MIT EMAIL William Wells III Harvard, MIT EMAIL
Pseudocode Yes Algorithm 1: Algorithmic implementation of the Ξ function as the inverse of its known inverse Ξ^-1
Open Source Code Yes Implementation details2 are available in Appendix E.2. 2https://github.com/Reuben Do/JSDlowerbound
Open Datasets Yes Table 1: Generalization performance (%) on MNIST dataset. Performance is evaluated by the mean classification accuracy on the MNIST test set after training on the MNIST training set.
Dataset Splits Yes Table 1: Generalization performance (%) on MNIST dataset. Performance is evaluated by the mean classification accuracy on the MNIST test set after training on the MNIST training set.
Hardware Specification Yes The computational time analysis is developed on a server with CPU Intel Xeon Platinum 8468 48-Core Processor and an NVIDIA GPU H100 and reported in Table 7.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers like Python, PyTorch, or CUDA versions. It mentions using 'Adam optimizer' but not with a specific version or framework.
Experiment Setup Yes Training is performed for 4000 steps using the Adam optimizer and batch size N = 64, matching the architecture and hyperparameters from prior work for comparability. As the function Ξ is strictly increasing, maximizing Ξ (log 2 LCE) is equivalent to minimizing the cross-entropy loss LCE. Therefore, the approximation of Ξ is not used during optimization. Implementation details2 are available in Appendix E.2.