reproducibilityindex.ai

Differentially-Private Clustering of Easy Instances

Authors: Edith Cohen, Haim Kaplan, Yishay Mansour, Uri Stemmer, Eliad Tsfadia

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We complement our theoretical analysis with an empirical evaluation on synthetic data. We implemented in Python our two main algorithms for k-tuple clustering: Privatek Averages and Privatek Noisy Centers. We compared the two algorithms in terms of the sample complexity that is needed to privately separate the samples from a given mixture of Gaussians.
Researcher Affiliation	Collaboration	1Google Research 2Blavatnik School of Computer Science, Tel Aviv University 3Ben-Gurion University.
Pseudocode	Yes	Algorithm Private Test Close Tuples (Figure 1), Algorithm Private Test Partition (Figure 2), Algorithm Privatek Averages (Figure 3), Algorithm Privatek Noisy Centers (Figure 4).
Open Source Code	No	The paper states 'We implemented in Python our two main algorithms' but does not provide any link or explicit statement about making the code open source or available.
Open Datasets	No	The paper describes generating 'synthetic data' for its experiments, stating 'we generated each k-tuple by running algorithm k-means++'. It does not refer to a publicly available dataset with concrete access information.
Dataset Splits	No	The paper uses synthetic data and discusses sample complexity but does not specify training, validation, and test dataset splits or cross-validation details.
Hardware Specification	Yes	All the experiments were tested in a Mac Book Pro Laptop with 4-core Intel i7 CPU with 2.8GHz, and with 16GB RAM.
Software Dependencies	No	The paper mentions 'implemented in Python' and using 'Gaussian Mixture from the package sklearn.mixture' but does not provide specific version numbers for Python, scikit-learn, or any other software dependencies.
Experiment Setup	Yes	In all the experiments we used privacy parameters ε = 1 and δ = e 28, and used β = 0.05. In all the tests of Privatek Noisy Centers, we chose = 10 ε k log(k/δ) p log(k/β), the number of k-tuples that we generated was exactly 3781. In the tests of Privatek Averages, we chose Λ = 210 k d and rmin = 0.1, we generated each k-tuple using 15 k samples.