Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning-Augmented Streaming Algorithms for Correlation Clustering

Authors: Yinhao Dong, Shan Jiang, Shi Li, Pan Peng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on synthetic and real-world datasets demonstrate the superiority of our proposed algorithms over their non-learning counterparts. In this section, we evaluate our proposed algorithm for complete graphs empirically on synthetic and real-world datasets. All experiments are conducted on a CPU with an i7-13700H processor and 32 GB RAM. For all results, unless otherwise stated, we report the average clustering cost over 20 independent trials. Our source code is available in the supplementary material.
Researcher Affiliation	Academia	1School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui Province, China 2School of Computer Science, Nanjing University, Nanjing, Jiangsu Province, China 3New Cornerstone Science Laboratory EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 An algorithm for complete graphs in dynamic streams Algorithm 2 An algorithm for general graphs in dynamic streams Algorithm 3 Offline version of Algorithm 1 (see Appendix C) Algorithm 4 TRUNCATEDPIVOT (see Appendix C) Algorithm 5 TRUNCATEDPIVOTWITHPRED (see Appendix C) Algorithm 6 CKLPU-PIVOT (see Appendix C) Algorithm 7 PAIRWISEDISS (see Appendix C) Algorithm 8 An algorithm for complete graphs in insertion-only streams (see Appendix E) Algorithm 9 CLUSTER (see Appendix E) Algorithm 10 CM-PIVOT (see Appendix E) Algorithm 11 PAIRWISEDISS2 (see Appendix E) Algorithm 12 PAIRWISEDISS2WITHPREROUNDING (see Appendix E)
Open Source Code	Yes	Our source code is available in the supplementary material.
Open Datasets	Yes	1) Synthetic datasets. These datasets are generated from the Stochastic Block Model (SBM). We use this model to plant ground-truth clusters. 2) Real-world datasets. We use EMAILCORE [70, 94], FACEBOOK [78], LASTFM [84], and DBLP [93] datasets. For simplicity, for all datasets, we only simulate insertion-only streams of edges. We refer to Appendix G.1 for detailed descriptions of the datasets. We provide basic statistics about these datasets in Table 3. Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data, 1(1):2, 2007. ... Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
Dataset Splits	No	No explicit training/validation/test dataset splits are provided. The paper mentions using synthetic datasets from SBM and real-world datasets, and simulating 'insertion-only streams of edges.' For the binary classifier, it mentions 'training a binary classifier' but does not detail the split methodology, percentages, or sample counts for any dataset used.
Hardware Specification	Yes	All experiments are conducted on a CPU with an i7-13700H processor and 32 GB RAM.
Software Dependencies	Yes	We use the powerful LP solver Gurobi [55] to get the optimal clusterings. [55] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
Experiment Setup	Yes	1) Synthetic datasets. These datasets are generated from the Stochastic Block Model (SBM). We use this model to plant ground-truth clusters. It samples positive edges between vertex pairs within the same cluster with probability p > 0.5, and samples positive edges across different clusters with probability (1 p). ... We set n = 100 in (a)-(c) and p = 0.95 in (d). ... For FB 0, we set β = 1.19. For FB 414, we set β = 1.12. For FB 3980, we set β = 1.19. We set k = 25 for (a), k = 15 for (b), k = 10 for (c), and k = 50 for (d). This predictor is constructed by training a binary classifier (based on an MLP model).