Parallel Correlation Clustering on Big Graphs

Authors: Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate experimentally that both algorithms outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15 speedup. ... In our experimental evaluation, we demonstrate that both algorithms gracefully scale up to graphs with billions of edges.
Researcher Affiliation Academia AMPLab, EECS at UC Berkeley, σStatistics at UC Berkeley
Pseudocode Yes Algorithm 1 Kwik Cluster with Algorithm 2 C4 & Cluster Wild!
Open Source Code Yes Code available at https://github.com/pxinghao/Parallel Correlation Clustering.
Open Datasets Yes The real graphs listed in Table 1 were each tested with 100 different random orderings. ... DBLP-2011 [25, 26, 27]. ENWiki-2013 [25, 26, 27]. UK-2005 [25, 26, 27]. IT-2004 [25, 26, 27]. Web Base-2001 [25, 26, 27].
Dataset Splits No No specific training, validation, or test dataset splits (e.g., percentages, sample counts, or cross-validation setup) were explicitly provided.
Hardware Specification Yes We ran all our experiments on Amazon EC2 s r3.8xlarge (32 v CPUs, 244Gb memory) instances, using 1-32 threads.
Software Dependencies No The paper states "Our parallel algorithms were all implemented in Scala" but does not specify any version numbers for Scala or other relevant libraries/solvers used.
Experiment Setup Yes Values of ε = 0.1, 0.5, 0.9 were used for C4 BSP, Cluster Wild! BSP and CDK.