reproducibilityindex.ai

Parallel Correlation Clustering on Big Graphs

Authors: Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate experimentally that both algorithms outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15 speedup. ... In our experimental evaluation, we demonstrate that both algorithms gracefully scale up to graphs with billions of edges.
Researcher Affiliation	Academia	AMPLab, EECS at UC Berkeley, σStatistics at UC Berkeley
Pseudocode	Yes	Algorithm 1 Kwik Cluster with Algorithm 2 C4 & Cluster Wild!
Open Source Code	Yes	Code available at https://github.com/pxinghao/Parallel Correlation Clustering.
Open Datasets	Yes	The real graphs listed in Table 1 were each tested with 100 different random orderings. ... DBLP-2011 [25, 26, 27]. ENWiki-2013 [25, 26, 27]. UK-2005 [25, 26, 27]. IT-2004 [25, 26, 27]. Web Base-2001 [25, 26, 27].
Dataset Splits	No	No specific training, validation, or test dataset splits (e.g., percentages, sample counts, or cross-validation setup) were explicitly provided.
Hardware Specification	Yes	We ran all our experiments on Amazon EC2 s r3.8xlarge (32 v CPUs, 244Gb memory) instances, using 1-32 threads.
Software Dependencies	No	The paper states "Our parallel algorithms were all implemented in Scala" but does not specify any version numbers for Scala or other relevant libraries/solvers used.
Experiment Setup	Yes	Values of ε = 0.1, 0.5, 0.9 were used for C4 BSP, Cluster Wild! BSP and CDK.