Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Communication-Optimal Distributed Clustering
Authors: Jiecao Chen, He Sun, David Woodruff, Qin Zhang
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our algorithms and demonstrate this phenomenon on real life datasets, showing that our algorithms are also very efficient in practice. 5 Experiments In this section we present experimental results for spectral graph clustering in the message passing and blackboard models. |
| Researcher Affiliation | Collaboration | Jiecao Chen Indiana University Bloomington, IN 47401 EMAIL He Sun University of Bristol Bristol, BS8 1UB, UK EMAIL David P. Woodruff IBM Research Almaden San Jose, CA 95120 EMAIL Qin Zhang Indiana University Bloomington, IN 47401 EMAIL |
| Pseudocode | No | The paper describes algorithms in prose and mathematical expressions but does not include structured pseudocode blocks. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for their described methodology is openly available. |
| Open Datasets | No | The paper describes the datasets (Twomoons, Gauss, Sculpture) in detail, but it does not provide specific links, DOIs, or citations with author/year information for public access to these datasets. |
| Dataset Splits | No | The paper describes the datasets used but does not specify training, validation, or test splits by percentage or absolute counts, nor does it refer to standard predefined splits. |
| Hardware Specification | Yes | Our experiments were conducted on an IBM Ne Xt Scale nx360 M4 server, which is equipped with 2 Intel Xeon E5-2652 v2 8-core processors, 32GB RAM and 250GB local storage. |
| Software Dependencies | No | We implemented the algorithms using multiple languages, including Matlab, Python and C++. The paper lists programming languages but does not provide specific version numbers for any software dependencies, libraries, or solvers. |
| Experiment Setup | Yes | We implemented the algorithms using multiple languages, including Matlab, Python and C++. Our experiments were conducted on an IBM Ne Xt Scale nx360 M4 server, which is equipped with 2 Intel Xeon E5-2652 v2 8-core processors, 32GB RAM and 250GB local storage. In the message passing model each site samples 5n edges; in the blackboard model all sites jointly sample 10n edges and the chain has length 18. |