Differentially Private Hierarchical Clustering with Provable Approximation Guarantees

Authors: Jacob Imola, Alessandro Epasto, Mohammad Mahdian, Vincent Cohen-Addad, Vahab Mirrokni

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we perform an empirical study of our algorithms and validate their performance.7. Experiments 7.1. Results
Researcher Affiliation Collaboration 1Department of Computer Science and Engineering, UCSD, La Jolla, USA. Work partially done while an intern at Google. 2Google, New York City, USA.
Pseudocode Yes Algorithm 1 DPHCBlocks, a hierarchical clustering algorithm in the HSBM given the blocks. Algorithm 2 DPCommunity, a community recovery Algorithm. Algorithm 3 DPCluster HSBM a hierarchical clustering algorithm in the HSBM given the blocks.
Open Source Code Yes To enable the replication of our work, we make the code available open-source 2. 2https://bitbucket.org/jjimola/dphc/src/master/
Open Datasets Yes Our real-world graph was generated from the MNIST digits dataset (Le Cun, 1998) Le Cun, Y. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
Dataset Splits No The paper mentions using the MNIST dataset and synthetic graphs but does not specify the train/validation/test splits, their percentages, or sample counts for any of the datasets used in the experiments. It only mentions averaging results over 5 runs.
Hardware Specification No The paper does not provide any specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions using 'standard scientific computing packages' and cites SciPy, but it does not specify any software dependencies with their corresponding version numbers required to replicate the experiments.
Experiment Setup Yes We ran algorithms at ϵ {0.5, 1.0, 2.0}, as well as with no privacy. We generated graphs from HSBM(B, P, f) with n = 2048 nodes, k = {4, 8} blocks, with block sizes chosen proportional to {1, γ, . . . , γk 1}, where γk 1 = 3. This has the effect of creating differently-sized blocks. We selected P to be a balanced tree over the blocks, and f that increases uniformly in the interval [0.1, 0.9] as the tree is descended.