Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning-Augmented Streaming Algorithms for Correlation Clustering

Authors: Yinhao Dong, Shan Jiang, Shi Li, Pan Peng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on synthetic and real-world datasets demonstrate the superiority of our proposed algorithms over their non-learning counterparts. In this section, we evaluate our proposed algorithm for complete graphs empirically on synthetic and real-world datasets. All experiments are conducted on a CPU with an i7-13700H processor and 32 GB RAM. For all results, unless otherwise stated, we report the average clustering cost over 20 independent trials. Our source code is available in the supplementary material.
Researcher Affiliation Academia 1School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui Province, China 2School of Computer Science, Nanjing University, Nanjing, Jiangsu Province, China 3New Cornerstone Science Laboratory EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1 An algorithm for complete graphs in dynamic streams Algorithm 2 An algorithm for general graphs in dynamic streams Algorithm 3 Offline version of Algorithm 1 (see Appendix C) Algorithm 4 TRUNCATEDPIVOT (see Appendix C) Algorithm 5 TRUNCATEDPIVOTWITHPRED (see Appendix C) Algorithm 6 CKLPU-PIVOT (see Appendix C) Algorithm 7 PAIRWISEDISS (see Appendix C) Algorithm 8 An algorithm for complete graphs in insertion-only streams (see Appendix E) Algorithm 9 CLUSTER (see Appendix E) Algorithm 10 CM-PIVOT (see Appendix E) Algorithm 11 PAIRWISEDISS2 (see Appendix E) Algorithm 12 PAIRWISEDISS2WITHPREROUNDING (see Appendix E)
Open Source Code Yes Our source code is available in the supplementary material.
Open Datasets Yes 1) Synthetic datasets. These datasets are generated from the Stochastic Block Model (SBM). We use this model to plant ground-truth clusters. 2) Real-world datasets. We use EMAILCORE [70, 94], FACEBOOK [78], LASTFM [84], and DBLP [93] datasets. For simplicity, for all datasets, we only simulate insertion-only streams of edges. We refer to Appendix G.1 for detailed descriptions of the datasets. We provide basic statistics about these datasets in Table 3. Jure Leskovec, Jon M. Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data, 1(1):2, 2007. ... Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014.
Dataset Splits No No explicit training/validation/test dataset splits are provided. The paper mentions using synthetic datasets from SBM and real-world datasets, and simulating 'insertion-only streams of edges.' For the binary classifier, it mentions 'training a binary classifier' but does not detail the split methodology, percentages, or sample counts for any dataset used.
Hardware Specification Yes All experiments are conducted on a CPU with an i7-13700H processor and 32 GB RAM.
Software Dependencies Yes We use the powerful LP solver Gurobi [55] to get the optimal clusterings. [55] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
Experiment Setup Yes 1) Synthetic datasets. These datasets are generated from the Stochastic Block Model (SBM). We use this model to plant ground-truth clusters. It samples positive edges between vertex pairs within the same cluster with probability p > 0.5, and samples positive edges across different clusters with probability (1 p). ... We set n = 100 in (a)-(c) and p = 0.95 in (d). ... For FB 0, we set β = 1.19. For FB 414, we set β = 1.12. For FB 3980, we set β = 1.19. We set k = 25 for (a), k = 15 for (b), k = 10 for (c), and k = 50 for (d). This predictor is constructed by training a binary classifier (based on an MLP model).