Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation
Authors: Weihao Gao, Sewoong Oh, Pramod Viswanath
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments suggest that the proposed estimator outperforms state-of-the-art entropy estimators, and the gap increases with correlation. The idea of using k-NN distance as bandwidth for entropy estimation was originally proposed by Kozachenko and Leonenko in [9], and is a special case of the k-LNN method we propose with degree 0 and a step kernel. We refer to Section 4 for a formal comparison. Another popular resubstitution entropy estimator is to use KDE in (3) [7], which is a special case of the k-LNN method with degree 0, and the Gaussian kernel is used in simulations. As comparison, we also study a new estimator [8] based on von Mises expansion (as opposed to simple re-substitution) which has an improved convergence rate in the large sample regime. We relegate simulation results to Section. B in the appendix. |
| Researcher Affiliation | Academia | Weihao Gao , Sewoong Oh , and Pramod Viswanath University of Illinois at Urbana-Champaign Urbana, IL 61801 EMAIL |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The paper does not provide explicit statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | No | The paper mentions generating i.i.d. samples from a 2-dimensional Gaussian and other specific distributions for experiments, but does not provide access information (link, DOI, formal citation) to a publicly available or open dataset. |
| Dataset Splits | No | The paper mentions using n = 500 samples and averaging over 100 instances for experiments but does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to standard splits). |
| Hardware Specification | No | No specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running the experiments are provided in the paper. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers that would be required to replicate the experiments. |
| Experiment Setup | Yes | Concretely, we choose, for each sample point Xi, the bandwidth h Xi to be the the distance to its k-th nearest neighbor ρk,i. Precisely, we propose the following k-Local Nearest Neighbor (k-LNN) entropy estimator of degree-2: ... where k Z+ is the only hyper parameter determining the bandwidth. In practice k is a small integer fixed to be in the range 4 8. ... For some typical choices of k, we provide approximate evaluations below, where 0.0183( 6) indicates empirical mean µ = 183 10 4 with confidence interval 6 10 4. In these numerical evaluations, we truncated the summation at m = 50, 000. ... The truncation is important for computational efficiency, but the analysis works as long as m = O(n1/(2d) ε) for any positive ε that can be arbitrarily small. For a larger m, for example of Ω(n), those neighbors that are further away have a different asymptotic behavior. ... We use the proposed estimators with p {0, 1, 2} to estimate the mutual information between two jointly Gaussian random variables with correlation r, from n = 500 samples, using resubstitution methods explained in the next sections. Each point is averaged over 100 instances. |