Differentially Private Clustering in High-Dimensional Euclidean Spaces

Authors: Maria-Florina Balcan, Travis Dick, Yingyu Liang, Wenlong Mou, Hongyang Zhang

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic and real datasets verify the effectiveness of the proposed method.
Researcher Affiliation Academia 1Carnegie Mellon University, Pittsburgh, PA, USA 2Princeton University, Princeton, NJ, USA 3Peking University, Beijing, China.
Pseudocode Yes Algorithm 1 private partition({xi}n i=1, ϵ, δ, Q). Algorithm 2 candidate({xi}n i=1, ϵ, δ). Algorithm 3 localswap({xi}n i=1, C, ϵ, δ). Algorithm 4 Private Clustering. Algorithm 5 Privately Recover Centers for Sparse Dataset.
Open Source Code No The paper does not provide concrete access to source code for the described methodology.
Open Datasets Yes MNIST: The raw pixels of MNIST (Le Cun et al., 1998). CIFAR-10: 100k randomly sampled examples from the CIFAR10 dataset (Krizhevsky, 2009). Synthetic: A synthetic dataset of 100k samples drawn from a mixture of 64 Gaussians in R100.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or test sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies No The paper mentions software like 'k-means++', 'Su LQ k-means', 'Gaussian mechanism', but does not specify any version numbers for these or other software dependencies.
Experiment Setup Yes The implementation of our algorithm projects to a space of dimension p = log(n)/2, rather than 8 log(n) and repeats the candidate set construction routine only k times. Finally, we perform 8 iterations of the Su LQ k-means algorithm to further improve the quality of the resulting centers. These modifications do not affect the privacy guarantee of the algorithm, but gave improved empirical performance. Our implementation of the Su LQ k-means algorithm runs for 20 iterations and uses the Gaussian mechanism to approximate the sum of points in each cluster, since this allowed us to add less noise. The Su LQ algorithm initializes its centers to be k randomly chosen points from the bounding box of the data. Unless otherwise stated, we set ϵ = 1.0.