Private and Communication-Efficient Algorithms for Entropy Estimation

Authors: Gecia Bravo-Hermsdorff, Róbert Busa-Fekete, Mohammad Ghavamzadeh, Andres Munoz Medina, Umar Syed

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we present two sets of experiments to support our theoretical findings. First, we demonstrate that Algorithm 1 is indeed able to estimate the Shannon entropy of tree-structured distributions with linear sample complexity in d. Thus it has a superior sample complexity comparing to the state-of-the-art non-interactive method [12, 7], which has a quadratic sample complexity in d. The sample complexity is defined here in terms of number of observations from pairs of marginals. In the second set of experiments, we use our Algorithm 5 to estimate the collision entropy of discrete distributions, and compare its performance to that of the best-known communication efficient, non-private algorithm for this task (which we refer to as Skorski s algorithm [36]).
Researcher Affiliation Collaboration Gecia Bravo-Hermsdorff Department of Statistics University College London gecia.bravo@gmail.com Róbert Busa-Fekete Google Research busarobi@google.com Mohammad Ghavamzadeh Google Research ghavamza@google.com Andres Muñoz Medina Google Research ammedina@google.com Umar Syed Google Research usyed@google.com
Pseudocode Yes Algorithm 1 Shannon entropy estimation for tree-structured distribution
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets No The paper describes generating synthetic data for experiments: "To create a random tree-structured distribution, we first sample the skeleton of the distribution by taking the maximum spanning tree of a complete graph with d nodes and edge weights distributed according to a standard normal." and "we drew samples from a discrete exponential distribution pi / e i with support size k = 1000". It does not provide access information for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific training/test/validation dataset splits. It describes how synthetic data was generated for experiments but not how it was partitioned for different phases of evaluation.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers needed to replicate the experiment.
Experiment Setup No The paper describes how the synthetic data was generated and how comparisons were made (e.g., "evaluate it over 100 repetitions and report the average"). However, it does not provide specific experimental setup details such as hyperparameters (learning rate, batch size, epochs, optimizer settings) which are typical for training machine learning models.