Differentially Private Hierarchical Clustering with Provable Approximation Guarantees
Authors: Jacob Imola, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad, Vahab Mirrokni
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we perform an empirical study of our algorithms and validate their performance.7. Experiments 7.1. Results |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Engineering, UCSD, La Jolla, USA. Work partially done while an intern at Google. 2Google, New York City, USA. |
| Pseudocode | Yes | Algorithm 1 DPHCBlocks, a hierarchical clustering algorithm in the HSBM given the blocks. Algorithm 2 DPCommunity, a community recovery Algorithm. Algorithm 3 DPCluster HSBM a hierarchical clustering algorithm in the HSBM given the blocks. |
| Open Source Code | Yes | To enable the replication of our work, we make the code available open-source 2. 2https://bitbucket.org/jjimola/dphc/src/master/ |
| Open Datasets | Yes | Our real-world graph was generated from the MNIST digits dataset (Le Cun, 1998) Le Cun, Y. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998. |
| Dataset Splits | No | The paper mentions using the MNIST dataset and synthetic graphs but does not specify the train/validation/test splits, their percentages, or sample counts for any of the datasets used in the experiments. It only mentions averaging results over 5 runs. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'standard scientific computing packages' and cites SciPy, but it does not specify any software dependencies with their corresponding version numbers required to replicate the experiments. |
| Experiment Setup | Yes | We ran algorithms at ϵ {0.5, 1.0, 2.0}, as well as with no privacy. We generated graphs from HSBM(B, P, f) with n = 2048 nodes, k = {4, 8} blocks, with block sizes chosen proportional to {1, γ, . . . , γk 1}, where γk 1 = 3. This has the effect of creating differently-sized blocks. We selected P to be a balanced tree over the blocks, and f that increases uniformly in the interval [0.1, 0.9] as the tree is descended. |