Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Differentially Private Hierarchical Clustering with Provable Approximation Guarantees
Authors: Jacob Imola, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad, Vahab Mirrokni
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we perform an empirical study of our algorithms and validate their performance.7. Experiments 7.1. Results |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, UCSD, La Jolla, USA. Work partially done while an intern at Google. 2Google, New York City, USA. |
| Pseudocode | Yes | Algorithm 1 DPHCBlocks, a hierarchical clustering algorithm in the HSBM given the blocks. Algorithm 2 DPCommunity, a community recovery Algorithm. Algorithm 3 DPCluster HSBM a hierarchical clustering algorithm in the HSBM given the blocks. |
| Open Source Code | Yes | To enable the replication of our work, we make the code available open-source 2. 2https://bitbucket.org/jjimola/dphc/src/master/ |
| Open Datasets | Yes | Our real-world graph was generated from the MNIST digits dataset (Le Cun, 1998) Le Cun, Y. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998. |
| Dataset Splits | No | The paper mentions using the MNIST dataset and synthetic graphs but does not specify the train/validation/test splits, their percentages, or sample counts for any of the datasets used in the experiments. It only mentions averaging results over 5 runs. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'standard scientific computing packages' and cites SciPy, but it does not specify any software dependencies with their corresponding version numbers required to replicate the experiments. |
| Experiment Setup | Yes | We ran algorithms at ϵ {0.5, 1.0, 2.0}, as well as with no privacy. We generated graphs from HSBM(B, P, f) with n = 2048 nodes, k = {4, 8} blocks, with block sizes chosen proportional to {1, γ, . . . , γk 1}, where γk 1 = 3. This has the effect of creating differently-sized blocks. We selected P to be a balanced tree over the blocks, and f that increases uniformly in the interval [0.1, 0.9] as the tree is descended. |