Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time
Authors: Laxman Dhulipala, David Eisenstat, Jakub Łącki, Vahab Mirrokni, Jessica Shi
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the performance of our algorithms on publicly available datasets, and show that our approach can speed up clustering of point datasets by a factor of 20.7 76.5x. |
| Researcher Affiliation | Collaboration | 1MIT CSAIL 2Google Research. |
| Pseudocode | Yes | Algorithm 1 MERGE(A, B, L) ... Algorithm 2 Graph HAC-Chain(G = (V, E, w), L) ... Algorithm 3 Graph HAC-Heap(G = (V, E, w), L) ... Algorithm 4 FLIPEDGE(A, B) ... Algorithm 5 UPDATEORIENTATION(A, B, EO) |
| Open Source Code | Yes | We have made our implementations publicly available on Github. 2 Footnote 2: https://github.com/Par Alg/gbbs/tree/ master/benchmarks/Clustering/Seq HAC |
| Open Datasets | Yes | We ran this experiment on a collection of large real-world graphs from the SNAP datasets. ... We evaluate our algorithms and the four HAC variants supported by sklearn on the iris, wine, digits, and cancer, and faces classification datasets. ... We use the Fashion-MNIST (764-dimensions), Last.fm (65 dimensions), and NYTimes (256 dimensions) datasets in these experiments. |
| Dataset Splits | No | The paper mentions evaluating on datasets and using 'slices' of datasets, but does not provide specific train/validation/test splits (percentages, counts, or references to predefined splits) for reproducibility. |
| Hardware Specification | Yes | We run our experiments on a 72-core machine with 4 2.4GHz Intel 18-core E7-8867 v4 Xeon processors, and 1TB of memory. |
| Software Dependencies | No | The paper mentions using C++, GBBS, PAM, ScaNN, Scikit-learn, Sci Py, and Fastcluster, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | Our approximate average-linkage algorithm (using ϵ = 0.1) |