Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Parallel Correlation Clustering on Big Graphs
Authors: Xinghao Pan, Dimitris Papailiopoulos, Samet Oymak, Benjamin Recht, Kannan Ramchandran, Michael I. Jordan
NeurIPS 2015 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate experimentally that both algorithms outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15 speedup. ... In our experimental evaluation, we demonstrate that both algorithms gracefully scale up to graphs with billions of edges. |
| Researcher Affiliation | Academia | AMPLab, EECS at UC Berkeley, σStatistics at UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Kwik Cluster with Algorithm 2 C4 & Cluster Wild! |
| Open Source Code | Yes | Code available at https://github.com/pxinghao/Parallel Correlation Clustering. |
| Open Datasets | Yes | The real graphs listed in Table 1 were each tested with 100 different random orderings. ... DBLP-2011 [25, 26, 27]. ENWiki-2013 [25, 26, 27]. UK-2005 [25, 26, 27]. IT-2004 [25, 26, 27]. Web Base-2001 [25, 26, 27]. |
| Dataset Splits | No | No specific training, validation, or test dataset splits (e.g., percentages, sample counts, or cross-validation setup) were explicitly provided. |
| Hardware Specification | Yes | We ran all our experiments on Amazon EC2 s r3.8xlarge (32 v CPUs, 244Gb memory) instances, using 1-32 threads. |
| Software Dependencies | No | The paper states "Our parallel algorithms were all implemented in Scala" but does not specify any version numbers for Scala or other relevant libraries/solvers used. |
| Experiment Setup | Yes | Values of ε = 0.1, 0.5, 0.9 were used for C4 BSP, Cluster Wild! BSP and CDK. |