reproducibilityindex.ai

Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation

Authors: Weihao Gao, Sewoong Oh, Pramod Viswanath

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments suggest that the proposed estimator outperforms state-of-the-art entropy estimators, and the gap increases with correlation. The idea of using k-NN distance as bandwidth for entropy estimation was originally proposed by Kozachenko and Leonenko in [9], and is a special case of the k-LNN method we propose with degree 0 and a step kernel. We refer to Section 4 for a formal comparison. Another popular resubstitution entropy estimator is to use KDE in (3) [7], which is a special case of the k-LNN method with degree 0, and the Gaussian kernel is used in simulations. As comparison, we also study a new estimator [8] based on von Mises expansion (as opposed to simple re-substitution) which has an improved convergence rate in the large sample regime. We relegate simulation results to Section. B in the appendix.
Researcher Affiliation	Academia	Weihao Gao , Sewoong Oh , and Pramod Viswanath University of Illinois at Urbana-Champaign Urbana, IL 61801 {wgao9,swoh,pramodv}@illinois.edu
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide explicit statements or links indicating that the source code for the methodology is openly available.
Open Datasets	No	The paper mentions generating i.i.d. samples from a 2-dimensional Gaussian and other specific distributions for experiments, but does not provide access information (link, DOI, formal citation) to a publicly available or open dataset.
Dataset Splits	No	The paper mentions using n = 500 samples and averaging over 100 instances for experiments but does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to standard splits).
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, memory amounts) used for running the experiments are provided in the paper.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers that would be required to replicate the experiments.
Experiment Setup	Yes	Concretely, we choose, for each sample point Xi, the bandwidth h Xi to be the the distance to its k-th nearest neighbor ρk,i. Precisely, we propose the following k-Local Nearest Neighbor (k-LNN) entropy estimator of degree-2: ... where k Z+ is the only hyper parameter determining the bandwidth. In practice k is a small integer ﬁxed to be in the range 4 8. ... For some typical choices of k, we provide approximate evaluations below, where 0.0183( 6) indicates empirical mean µ = 183 10 4 with conﬁdence interval 6 10 4. In these numerical evaluations, we truncated the summation at m = 50, 000. ... The truncation is important for computational efﬁciency, but the analysis works as long as m = O(n1/(2d) ε) for any positive ε that can be arbitrarily small. For a larger m, for example of Ω(n), those neighbors that are further away have a different asymptotic behavior. ... We use the proposed estimators with p {0, 1, 2} to estimate the mutual information between two jointly Gaussian random variables with correlation r, from n = 500 samples, using resubstitution methods explained in the next sections. Each point is averaged over 100 instances.