Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Collaborative Learning of Discrete Distributions under Heterogeneity and Communication Constraints
Authors: Xinmeng Huang, Donghwan Lee, Edgar Dobriban, Hamed Hassani
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Further, we provide experimental results using both synthetic data and n-gram frequency estimation in the text domain, which corroborate its efficiency. |
| Researcher Affiliation | Academia | Xinmeng Huang , Donghwan Lee Graduate Group in Applied Mathematics and Computational Science University of Pennsylvania Philadelphia, PA 19104 EMAIL Edgar Dobriban Department of Statistics and Data Science University of Pennsylvania Philadelphia, PA 19104 EMAIL Hamed Hassani Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia, PA 19104 EMAIL |
| Pseudocode | Yes | Algorithm 1 SHIFT: Sparse Heterogeneity Inspired collaboration and Fine-Tuning |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | Yes | We test SHIFT on synthetic data as well as the Shakespeare dataset [11]. The Shakespeare dataset was proposed as a benchmark for federated learning in [11]. |
| Dataset Splits | No | The paper mentions generating synthetic data and drawing datapoints for the Shakespeare dataset, but it does not specify explicit train/validation/test splits for reproduction. |
| Hardware Specification | No | The paper states "Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A]", indicating no specific hardware details are provided. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., programming language versions, library versions, or solver versions). |
| Experiment Setup | Yes | We set the threshold parameter α = ln(n) and the trimming proportion ω = 0.1. We set the dimension to d = 300 and run the simulation by varying n, T, s. We set the uniform distribution, p = (1/d, . . . , 1/d) as the central distribution. |