Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
Authors: Laxman Dhulipala, David Eisenstat, Jakub Lacki, Vahab Mirrokni, Jessica Shi
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical results with a comprehensive study of the Par HAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines. |
| Researcher Affiliation | Collaboration | Laxman Dhulipala University of Maryland and Google Research laxman@umd.edu David Eisenstat Google Research eisen@google.com Jakub Ł acki Google Research jlacki@google.com Vahab Mirrokni Google Research mirrokni@google.com Jessica Shi MIT CSAIL jeshi@mit.edu |
| Pseudocode | Yes | Algorithm 1 Par HAC-Contract Layer(G = (V, E, w), TL, ϵ, D) |
| Open Source Code | Yes | We have made our implementation publicly available on Git Hub.5 https://github.com/Par Alg/Par HAC |
| Open Datasets | Yes | We evaluate our algorithms on the iris, wine, digits, and cancer, and faces classification datasets from the UCI dataset repository (found in the sklearn.datasets package). |
| Dataset Splits | No | The paper discusses the datasets used and parameters for graph construction (k-NN), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits) needed for reproducibility. |
| Hardware Specification | Yes | We ran all of our experiments on a 72-core Dell Power Edge R930 (with two-way hyper-threading) with 4 2.4GHz Intel 18-core E7-8867 v4 Xeon processors (with a 4800MHz bus and 45MB L3 cache) and 1TB of main memory. |
| Software Dependencies | No | The paper mentions software frameworks and libraries like CPAM, Aspen, scipy, and sklearn.datasets, but it does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We run the Par HAC and Seq HAC algorithm using ϵ = 0.1. We compute the k-approximate nearest neighbors using a shared-memory parallel implementation of the Vamana approximate nearest neighbors (ANN) algorithm [45] with parameters R = 75, L = 100, Q = max(L, k). |