Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Practical Almost-Linear-Time Approximation Algorithms for Hybrid and Overlapping Graph Clustering

Authors: Lorenzo Orecchia, Konstantinos Ameranis, Charalampos Tsourakakis, Kunal Talwar

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Crucially, we implement our approximation algorithm to produce both overlapping and hybrid partitions for large graphs, easily scaling to tens of millions of edges, and test our implementation on real-world datasets against other competitive baselines. ... We evaluate the performance of our proposed method cm+improve on graphs sampled from the Overlapping Stochastic Block Model (Abbe & Sandon, 2015) and on large real-world networks from the SNAP collection (Leskovec & Krevl, 2014). Our results show that cm+improve is competitive or outperforms baselines while scaling to graphs with over 10^7 edges.
Researcher Affiliation	Collaboration	Konstantinos Ameranis 1 Lorenzo Orecchia 1 Kunal Talwar 2 Charalampos Tsourakakis 3 1Department of Computer Science, University of Chicago, Chicago, USA 2Apple Inc 3Department of Computer Science, Boston University, Boston, USA.
Pseudocode	Yes	Pseudocode for resulting generic algorithm is given as Algorithm 1.
Open Source Code	Yes	All assets are currently accessible (sup, 2021) and will become publicly available under the BSD license after publication. ... Our implementation is available online (sup, 2021).
Open Datasets	Yes	We evaluate the performance of our proposed method cm+improve on graphs sampled from the Overlapping Stochastic Block Model (Abbe & Sandon, 2015) and on large real-world networks from the SNAP collection (Leskovec & Krevl, 2014).
Dataset Splits	No	The paper discusses evaluating its method on generated graphs (OSBM) and real-world networks (SNAP) but does not specify train, validation, or test dataset splits as typically found in machine learning contexts (e.g., percentages or sample counts for distinct splits).
Hardware Specification	Yes	All experiments were conducted on an institutional cluster on machines with 24 Cores (2x 24 core Intel Xeon Silver 4116 CPU @ 2.10GHz), 48 threads and 128GB RAM.
Software Dependencies	Yes	The cut-matching strategy of Theorem 7 is implemented in MATLAB, while the cut-improvement algorithm Hybrid Improve is single-threaded C++, using Goldberg s s-t-maxflow solver HIPR, which implements the push-relabel algorithm (Goldberg; Cherkassky & Goldberg, 1997). ... Goldberg, A. Hipr version 3.7.
Experiment Setup	Yes	Throughout our evaluation, we set µ to be the degree measure of the graph, as other methods are already tuned to minimize conductance. For experiments requiring the output to be a balanced overlapping partition, we implemented a simple heuristic modification of Hybrid Improve, by only routing a fraction of the maximum flow in Hybrid Improve, as suggested by Khandekar et al. (2009). Our implementation performs some numerical approximations to minimize the number of calls to HIPR and, as a result, departs slightly from the theoretical description of the Hybrid Improve algorithm. These optimizations are described in the Appendix.