Private and Communication-Efficient Algorithms for Entropy Estimation
Authors: Gecia Bravo-Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andres Munoz Medina, Umar Syed
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we present two sets of experiments to support our theoretical findings. First, we demonstrate that Algorithm 1 is indeed able to estimate the Shannon entropy of tree-structured distributions with linear sample complexity in d. Thus it has a superior sample complexity comparing to the state-of-the-art non-interactive method [12, 7], which has a quadratic sample complexity in d. The sample complexity is defined here in terms of number of observations from pairs of marginals. In the second set of experiments, we use our Algorithm 5 to estimate the collision entropy of discrete distributions, and compare its performance to that of the best-known communication efficient, non-private algorithm for this task (which we refer to as Skorski s algorithm [36]). |
| Researcher Affiliation | Collaboration | Gecia Bravo-Hermsdorff Department of Statistics University College London gecia.bravo@gmail.com Róbert Busa-Fekete Google Research busarobi@google.com Mohammad Ghavamzadeh Google Research ghavamza@google.com Andres Muñoz Medina Google Research ammedina@google.com Umar Syed Google Research usyed@google.com |
| Pseudocode | Yes | Algorithm 1 Shannon entropy estimation for tree-structured distribution |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the described methodology. |
| Open Datasets | No | The paper describes generating synthetic data for experiments: "To create a random tree-structured distribution, we first sample the skeleton of the distribution by taking the maximum spanning tree of a complete graph with d nodes and edge weights distributed according to a standard normal." and "we drew samples from a discrete exponential distribution pi / e i with support size k = 1000". It does not provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific training/test/validation dataset splits. It describes how synthetic data was generated for experiments but not how it was partitioned for different phases of evaluation. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers needed to replicate the experiment. |
| Experiment Setup | No | The paper describes how the synthetic data was generated and how comparisons were made (e.g., "evaluate it over 100 repetitions and report the average"). However, it does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, epochs, optimizer settings) which are typical for training machine learning models. |