Differentially-Private Clustering of Easy Instances
Authors: Edith Cohen, Haim Kaplan, Yishay Mansour, Uri Stemmer, Eliad Tsfadia
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical analysis with an empirical evaluation on synthetic data. We implemented in Python our two main algorithms for k-tuple clustering: Privatek Averages and Privatek Noisy Centers. We compared the two algorithms in terms of the sample complexity that is needed to privately separate the samples from a given mixture of Gaussians. |
| Researcher Affiliation | Collaboration | 1Google Research 2Blavatnik School of Computer Science, Tel Aviv University 3Ben-Gurion University. |
| Pseudocode | Yes | Algorithm Private Test Close Tuples (Figure 1), Algorithm Private Test Partition (Figure 2), Algorithm Privatek Averages (Figure 3), Algorithm Privatek Noisy Centers (Figure 4). |
| Open Source Code | No | The paper states 'We implemented in Python our two main algorithms' but does not provide any link or explicit statement about making the code open source or available. |
| Open Datasets | No | The paper describes generating 'synthetic data' for its experiments, stating 'we generated each k-tuple by running algorithm k-means++'. It does not refer to a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper uses synthetic data and discusses sample complexity but does not specify training, validation, and test dataset splits or cross-validation details. |
| Hardware Specification | Yes | All the experiments were tested in a Mac Book Pro Laptop with 4-core Intel i7 CPU with 2.8GHz, and with 16GB RAM. |
| Software Dependencies | No | The paper mentions 'implemented in Python' and using 'Gaussian Mixture from the package sklearn.mixture' but does not provide specific version numbers for Python, scikit-learn, or any other software dependencies. |
| Experiment Setup | Yes | In all the experiments we used privacy parameters ε = 1 and δ = e 28, and used β = 0.05. In all the tests of Privatek Noisy Centers, we chose = 10 ε k log(k/δ) p log(k/β), the number of k-tuples that we generated was exactly 3781. In the tests of Privatek Averages, we chose Λ = 210 k d and rmin = 0.1, we generated each k-tuple using 15 k samples. |