Practical Almost-Linear-Time Approximation Algorithms for Hybrid and Overlapping Graph Clustering

Authors: Lorenzo Orecchia, Konstantinos Ameranis, Charalampos Tsourakakis, Kunal Talwar

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Crucially, we implement our approximation algorithm to produce both overlapping and hybrid partitions for large graphs, easily scaling to tens of millions of edges, and test our implementation on real-world datasets against other competitive baselines. ... We evaluate the performance of our proposed method cm+improve on graphs sampled from the Overlapping Stochastic Block Model (Abbe & Sandon, 2015) and on large real-world networks from the SNAP collection (Leskovec & Krevl, 2014). Our results show that cm+improve is competitive or outperforms baselines while scaling to graphs with over 10^7 edges.
Researcher Affiliation Collaboration Konstantinos Ameranis 1 Lorenzo Orecchia 1 Kunal Talwar 2 Charalampos Tsourakakis 3 1Department of Computer Science, University of Chicago, Chicago, USA 2Apple Inc 3Department of Computer Science, Boston University, Boston, USA.
Pseudocode Yes Pseudocode for resulting generic algorithm is given as Algorithm 1.
Open Source Code Yes All assets are currently accessible (sup, 2021) and will become publicly available under the BSD license after publication. ... Our implementation is available online (sup, 2021).
Open Datasets Yes We evaluate the performance of our proposed method cm+improve on graphs sampled from the Overlapping Stochastic Block Model (Abbe & Sandon, 2015) and on large real-world networks from the SNAP collection (Leskovec & Krevl, 2014).
Dataset Splits No The paper discusses evaluating its method on generated graphs (OSBM) and real-world networks (SNAP) but does not specify train, validation, or test dataset splits as typically found in machine learning contexts (e.g., percentages or sample counts for distinct splits).
Hardware Specification Yes All experiments were conducted on an institutional cluster on machines with 24 Cores (2x 24 core Intel Xeon Silver 4116 CPU @ 2.10GHz), 48 threads and 128GB RAM.
Software Dependencies Yes The cut-matching strategy of Theorem 7 is implemented in MATLAB, while the cut-improvement algorithm Hybrid Improve is single-threaded C++, using Goldberg s s-t-maxflow solver HIPR, which implements the push-relabel algorithm (Goldberg; Cherkassky & Goldberg, 1997). ... Goldberg, A. Hipr version 3.7.
Experiment Setup Yes Throughout our evaluation, we set ยต to be the degree measure of the graph, as other methods are already tuned to minimize conductance. For experiments requiring the output to be a balanced overlapping partition, we implemented a simple heuristic modification of Hybrid Improve, by only routing a fraction of the maximum flow in Hybrid Improve, as suggested by Khandekar et al. (2009). Our implementation performs some numerical approximations to minimize the number of calls to HIPR and, as a result, departs slightly from the theoretical description of the Hybrid Improve algorithm. These optimizations are described in the Appendix.