Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth
Authors: Laxman Dhulipala, David Eisenstat, Jakub Lacki, Vahab Mirrokni, Jessica Shi
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical results with a comprehensive study of the Par HAC algorithm in terms of its scalability, performance, and quality, and compare with several state-of-the-art sequential and parallel baselines. |
| Researcher Affiliation | Collaboration | Laxman Dhulipala University of Maryland and Google Research EMAIL David Eisenstat Google Research EMAIL Jakub Ł acki Google Research EMAIL Vahab Mirrokni Google Research EMAIL Jessica Shi MIT CSAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Par HAC-Contract Layer(G = (V, E, w), TL, ϵ, D) |
| Open Source Code | Yes | We have made our implementation publicly available on Git Hub.5 https://github.com/Par Alg/Par HAC |
| Open Datasets | Yes | We evaluate our algorithms on the iris, wine, digits, and cancer, and faces classification datasets from the UCI dataset repository (found in the sklearn.datasets package). |
| Dataset Splits | No | The paper discusses the datasets used and parameters for graph construction (k-NN), but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits) needed for reproducibility. |
| Hardware Specification | Yes | We ran all of our experiments on a 72-core Dell Power Edge R930 (with two-way hyper-threading) with 4 2.4GHz Intel 18-core E7-8867 v4 Xeon processors (with a 4800MHz bus and 45MB L3 cache) and 1TB of main memory. |
| Software Dependencies | No | The paper mentions software frameworks and libraries like CPAM, Aspen, scipy, and sklearn.datasets, but it does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | We run the Par HAC and Seq HAC algorithm using ϵ = 0.1. We compute the k-approximate nearest neighbors using a shared-memory parallel implementation of the Vamana approximate nearest neighbors (ANN) algorithm [45] with parameters R = 75, L = 100, Q = max(L, k). |