reproducibilityindex.ai

Differentially Private Clustering in High-Dimensional Euclidean Spaces

Authors: Maria-Florina Balcan, Travis Dick, Yingyu Liang, Wenlong Mou, Hongyang Zhang

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on both synthetic and real datasets verify the effectiveness of the proposed method.
Researcher Affiliation	Academia	1Carnegie Mellon University, Pittsburgh, PA, USA 2Princeton University, Princeton, NJ, USA 3Peking University, Beijing, China.
Pseudocode	Yes	Algorithm 1 private partition({xi}n i=1, ϵ, δ, Q). Algorithm 2 candidate({xi}n i=1, ϵ, δ). Algorithm 3 localswap({xi}n i=1, C, ϵ, δ). Algorithm 4 Private Clustering. Algorithm 5 Privately Recover Centers for Sparse Dataset.
Open Source Code	No	The paper does not provide concrete access to source code for the described methodology.
Open Datasets	Yes	MNIST: The raw pixels of MNIST (Le Cun et al., 1998). CIFAR-10: 100k randomly sampled examples from the CIFAR10 dataset (Krizhevsky, 2009). Synthetic: A synthetic dataset of 100k samples drawn from a mixture of 64 Gaussians in R100.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or test sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions software like 'k-means++', 'Su LQ k-means', 'Gaussian mechanism', but does not specify any version numbers for these or other software dependencies.
Experiment Setup	Yes	The implementation of our algorithm projects to a space of dimension p = log(n)/2, rather than 8 log(n) and repeats the candidate set construction routine only k times. Finally, we perform 8 iterations of the Su LQ k-means algorithm to further improve the quality of the resulting centers. These modiﬁcations do not affect the privacy guarantee of the algorithm, but gave improved empirical performance. Our implementation of the Su LQ k-means algorithm runs for 20 iterations and uses the Gaussian mechanism to approximate the sum of points in each cluster, since this allowed us to add less noise. The Su LQ algorithm initializes its centers to be k randomly chosen points from the bounding box of the data. Unless otherwise stated, we set ϵ = 1.0.