Entropy testing and its application to testing Bayesian networks

Authors: Clément L Canonne, Qiping Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper studies the problem of entropy identity testing: given sample access to a distribution p and a fully described distribution q (both discrete distributions over a domain of size k), and the promise that either p = q or |H(p) H(q)| ε, where H( ) denotes the Shannon entropy, a tester needs to distinguish between the two cases with high probability. We establish a near-optimal sample complexity bound of Θ(k/ε + 1/ε2) for this problem, and show how to apply it to the problem of identity testing for in-degree-d n-dimensional Bayesian networks, obtaining an upper bound of O(2d/2n3/2/ε2 + n2/ε4).
Researcher Affiliation Academia Cl ement L. Canonne University of Sydney clement.canonne@sydney.edu.au Joy Qiping Yang University of Sydney qyan6238@uni.sydney.edu.au
Pseudocode Yes Algorithm 1 Entropy identity testing Algorithm 2 Identity testing for bounded degree Bayes nets
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. The NeurIPS checklist also indicates no code for experiments.
Open Datasets No The paper is theoretical and focuses on sample complexity bounds and algorithms for distribution testing. It does not conduct empirical studies with datasets, and therefore no training data is mentioned.
Dataset Splits No The paper is theoretical and does not include empirical experiments or data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not describe empirical experiments, therefore no hardware specifications are provided.
Software Dependencies No The paper is theoretical and does not describe empirical experiments, therefore no specific software dependencies with version numbers are listed.
Experiment Setup No The paper is theoretical and focuses on algorithm design and analysis. It does not describe an empirical experimental setup, hyperparameters, or training configurations.