Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the Price of Differential Privacy for Hierarchical Clustering
Authors: Chengyuan Deng, Jie Gao, Jalaj Upadhyay, Chen Wang, Samson Zhou
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our new algorithm HIERARCHICALCLUSTERING-DP and demonstrate the experimental results in this section. First, we evaluate the performance of our algorithm in terms of Dasgupta s objective on synthetic and real-world graphs. Next, we show our algorithm has favorable scalability with large graphs (n 1000). |
| Researcher Affiliation | Academia | Rutgers University, EMAIL Texas A&M University, EMAIL |
| Pseudocode | Yes | BALANCEDSPARSESTCUT-DP: An ε-DP Algorithm for Balanced Sparsest Cut Input: Graph G = (V, E, w), |V | = n, privacy parameter ε > 0. 1. e E, add 10 log n/ε to the edge weight w(e), denoted as G = (V, E, w ) 2. e E, add independent noise Lap(1/ε) to w (e), denoted as G = (V, E, w ). 3. Run the algorithm from Proposition 3.1 on G and return the cut S. |
| Open Source Code | Yes | Our codes are publicly available on the Anonymous Github2. 2https://anonymous.4open.science/r/dp-hc-8612/ |
| Open Datasets | Yes | we generate two datasets of synthetic graphs from the stochastic block model (SBM) and hierarchical SBM (HSBM), then we select real-world datasets: IRIS, WINE and BOSTON from the scikit-learn package (Pedregosa et al., 2011). |
| Dataset Splits | No | The paper does not explicitly mention training/testing/validation dataset splits, only how synthetic graphs were generated and real-world similarity graphs constructed. For synthetic graphs, it mentions '10 different graphs generated by the same set of parameters where p = 0.7 (intra-probability) and q = 0.1 (interprobability)'. For real-world graphs, it describes 'similarity graph is constructed via the Gaussian kernel' but no splits. |
| Hardware Specification | Yes | All of our experiments are implemented on a Macbook Pro with M2 CPU. |
| Software Dependencies | No | The paper mentions 'scikit-learn package' but does not specify its version number. No other software dependencies are listed with specific versions. |
| Experiment Setup | Yes | The results shown in this section are based on 10 different graphs generated by the same set of parameters where p = 0.7 (intra-probability) and q = 0.1 (interprobability). ... As for the choice of ε, the privacy parameter, we test all algorithms on ε {0.01, 0.1, 0.5, 1, 2}. |