Differentially Private Clustering in High-Dimensional Euclidean Spaces
Authors: Maria-Florina Balcan, Travis Dick, Yingyu Liang, Wenlong Mou, Hongyang Zhang
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real datasets verify the effectiveness of the proposed method. |
| Researcher Affiliation | Academia | 1Carnegie Mellon University, Pittsburgh, PA, USA 2Princeton University, Princeton, NJ, USA 3Peking University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 private partition({xi}n i=1, ϵ, δ, Q). Algorithm 2 candidate({xi}n i=1, ϵ, δ). Algorithm 3 localswap({xi}n i=1, C, ϵ, δ). Algorithm 4 Private Clustering. Algorithm 5 Privately Recover Centers for Sparse Dataset. |
| Open Source Code | No | The paper does not provide concrete access to source code for the described methodology. |
| Open Datasets | Yes | MNIST: The raw pixels of MNIST (Le Cun et al., 1998). CIFAR-10: 100k randomly sampled examples from the CIFAR10 dataset (Krizhevsky, 2009). Synthetic: A synthetic dataset of 100k samples drawn from a mixture of 64 Gaussians in R100. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'k-means++', 'Su LQ k-means', 'Gaussian mechanism', but does not specify any version numbers for these or other software dependencies. |
| Experiment Setup | Yes | The implementation of our algorithm projects to a space of dimension p = log(n)/2, rather than 8 log(n) and repeats the candidate set construction routine only k times. Finally, we perform 8 iterations of the Su LQ k-means algorithm to further improve the quality of the resulting centers. These modifications do not affect the privacy guarantee of the algorithm, but gave improved empirical performance. Our implementation of the Su LQ k-means algorithm runs for 20 iterations and uses the Gaussian mechanism to approximate the sum of points in each cluster, since this allowed us to add less noise. The Su LQ algorithm initializes its centers to be k randomly chosen points from the bounding box of the data. Unless otherwise stated, we set ϵ = 1.0. |