reproducibilityindex.ai

Communication-Efficient Distributed Learning of Discrete Distributions

Authors: Ilias Diakonikolas, Elena Grigorescu, Jerry Li, Abhiram Natarajan, Krzysztof Onak, Ludwig Schmidt

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We provide the ﬁrst non-trivial upper and lower bounds on the communication complexity of this basic estimation task in various settings of interest. Speciﬁcally, our results include the following: 1. When the unknown discrete distribution is unstructured and each server has only one sample, we show that any blackboard protocol (i.e., any protocol in which servers interact arbitrarily using public messages) that learns the distribution must essentially communicate the entire sample. 2. For the case of structured distributions, such as k-histograms and monotone distributions, we design distributed learning algorithms that achieve significantly better communication guarantees than the naive ones, and obtain tight upper and lower bounds in several regimes.
Researcher Affiliation	Collaboration	Ilias Diakonikolas CS, USC diakonik@usc.edu; Elena Grigorescu CS, Purdue elena-g@purdue.edu; Jerry Li EECS & CSAIL, MIT jerryzli@mit.edu; Abhiram Natarajan CS, Purdue nataraj2@purdue.edu; Krzysztof Onak IBM Research, NY konak@us.ibm.com; Ludwig Schmidt EECS & CSAIL, MIT ludwigs@mit.edu
Pseudocode	No	The paper describes algorithmic ideas but does not include any clearly labeled pseudocode blocks or algorithm figures.
Open Source Code	No	The paper does not contain any statements about releasing code, nor does it provide links to an open-source repository for the described methodology.
Open Datasets	No	This paper focuses on theoretical contributions (algorithms and bounds) for distributed learning and does not use or specify any particular public datasets for training. The term 'samples' refers to abstract data points in the theoretical model.
Dataset Splits	No	As a theoretical paper, there are no empirical experiments conducted that would require specifying training, validation, or testing dataset splits.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and lower bounds, not on empirical experimentation. Therefore, no hardware specifications for running experiments are provided.
Software Dependencies	No	The paper is theoretical and does not discuss implementation details or empirical results. Consequently, there are no mentions of specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on the design and analysis of algorithms and communication complexity. It does not include descriptions of experimental setups, hyperparameters, or system-level training settings.