Differentially-Private Clustering of Easy Instances

Authors: Edith Cohen, Haim Kaplan, Yishay Mansour, Uri Stemmer, Eliad Tsfadia

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical analysis with an empirical evaluation on synthetic data. We implemented in Python our two main algorithms for k-tuple clustering: Privatek Averages and Privatek Noisy Centers. We compared the two algorithms in terms of the sample complexity that is needed to privately separate the samples from a given mixture of Gaussians.
Researcher Affiliation Collaboration 1Google Research 2Blavatnik School of Computer Science, Tel Aviv University 3Ben-Gurion University.
Pseudocode Yes Algorithm Private Test Close Tuples (Figure 1), Algorithm Private Test Partition (Figure 2), Algorithm Privatek Averages (Figure 3), Algorithm Privatek Noisy Centers (Figure 4).
Open Source Code No The paper states 'We implemented in Python our two main algorithms' but does not provide any link or explicit statement about making the code open source or available.
Open Datasets No The paper describes generating 'synthetic data' for its experiments, stating 'we generated each k-tuple by running algorithm k-means++'. It does not refer to a publicly available dataset with concrete access information.
Dataset Splits No The paper uses synthetic data and discusses sample complexity but does not specify training, validation, and test dataset splits or cross-validation details.
Hardware Specification Yes All the experiments were tested in a Mac Book Pro Laptop with 4-core Intel i7 CPU with 2.8GHz, and with 16GB RAM.
Software Dependencies No The paper mentions 'implemented in Python' and using 'Gaussian Mixture from the package sklearn.mixture' but does not provide specific version numbers for Python, scikit-learn, or any other software dependencies.
Experiment Setup Yes In all the experiments we used privacy parameters ε = 1 and δ = e 28, and used β = 0.05. In all the tests of Privatek Noisy Centers, we chose = 10 ε k log(k/δ) p log(k/β), the number of k-tuples that we generated was exactly 3781. In the tests of Privatek Averages, we chose Λ = 210 k d and rmin = 0.1, we generated each k-tuple using 15 k samples.