Parallel Correlation Clustering on Big Graphs
Authors: Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate experimentally that both algorithms outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15 speedup. ... In our experimental evaluation, we demonstrate that both algorithms gracefully scale up to graphs with billions of edges. |
| Researcher Affiliation | Academia | AMPLab, EECS at UC Berkeley, σStatistics at UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Kwik Cluster with Algorithm 2 C4 & Cluster Wild! |
| Open Source Code | Yes | Code available at https://github.com/pxinghao/Parallel Correlation Clustering. |
| Open Datasets | Yes | The real graphs listed in Table 1 were each tested with 100 different random orderings. ... DBLP-2011 [25, 26, 27]. ENWiki-2013 [25, 26, 27]. UK-2005 [25, 26, 27]. IT-2004 [25, 26, 27]. Web Base-2001 [25, 26, 27]. |
| Dataset Splits | No | No specific training, validation, or test dataset splits (e.g., percentages, sample counts, or cross-validation setup) were explicitly provided. |
| Hardware Specification | Yes | We ran all our experiments on Amazon EC2 s r3.8xlarge (32 v CPUs, 244Gb memory) instances, using 1-32 threads. |
| Software Dependencies | No | The paper states "Our parallel algorithms were all implemented in Scala" but does not specify any version numbers for Scala or other relevant libraries/solvers used. |
| Experiment Setup | Yes | Values of ε = 0.1, 0.5, 0.9 were used for C4 BSP, Cluster Wild! BSP and CDK. |