Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth

Authors: Laxman Dhulipala, David Eisenstat, Jakub Lacki, Vahab Mirrokni, Jessica Shi

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical results with a comprehensive study of the Par HAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines.
Researcher Affiliation Collaboration Laxman Dhulipala University of Maryland and Google Research laxman@umd.edu David Eisenstat Google Research eisen@google.com Jakub Ł acki Google Research jlacki@google.com Vahab Mirrokni Google Research mirrokni@google.com Jessica Shi MIT CSAIL jeshi@mit.edu
Pseudocode Yes Algorithm 1 Par HAC-Contract Layer(G = (V, E, w), TL, ϵ, D)
Open Source Code Yes We have made our implementation publicly available on Git Hub.5 https://github.com/Par Alg/Par HAC
Open Datasets Yes We evaluate our algorithms on the iris, wine, digits, and cancer, and faces classification datasets from the UCI dataset repository (found in the sklearn.datasets package).
Dataset Splits No The paper discusses the datasets used and parameters for graph construction (k-NN), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits) needed for reproducibility.
Hardware Specification Yes We ran all of our experiments on a 72-core Dell Power Edge R930 (with two-way hyper-threading) with 4 2.4GHz Intel 18-core E7-8867 v4 Xeon processors (with a 4800MHz bus and 45MB L3 cache) and 1TB of main memory.
Software Dependencies No The paper mentions software frameworks and libraries like CPAM, Aspen, scipy, and sklearn.datasets, but it does not provide specific version numbers for these dependencies.
Experiment Setup Yes We run the Par HAC and Seq HAC algorithm using ϵ = 0.1. We compute the k-approximate nearest neighbors using a shared-memory parallel implementation of the Vamana approximate nearest neighbors (ANN) algorithm [45] with parameters R = 75, L = 100, Q = max(L, k).