Locally Private k-Means in One Round

Authors: Alisa Chang, Badih Ghazi, Ravi Kumar, Pasin Manurangsi

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5, we present some empirical evaluation of our algorithm.
Researcher Affiliation Industry 1Google Research, Mountain View, CA. Correspondence to: Alisa Chang <alisac@google.com>, Badih Ghazi <badihghazi@gmail.com>, Ravi Kumar <ravi.k53@gmail.com>, Pasin Manurangsi <pasin@google.com>.
Pseudocode Yes Algorithm 1 Building the Net Tree. Algorithm 2 Computing the Threshold. Algorithm 3 Encoding Algorithm for k-means. Algorithm 4 Decoding Algorithm for k-means.
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets No We use mixtures of Gaussians, which are generated as follows. For a separation parameter r and the number k of clusters, first pick k centers uniformly at random from the sphere of radius slightly less than one (i.e., 1 (r)). Then, for each center, we create n/k points by adding to the center a Gaussian-distributed vector whose expected norm is 1/r. Finally, we project any point that is outside of the unit ball back into it.
Dataset Splits No The paper describes generating its own dataset from mixtures of Gaussians but does not provide explicit details on how this data is split into training, validation, or test sets. No percentages, absolute counts, or predefined split references are given.
Hardware Specification No No specific hardware details (like GPU models, CPU models, or memory) were mentioned for the experiments. The text mentions trying "some naive baseline algorithms, such as noising each point using Gaussian/Laplace noise and running k-means++ on this noisy dataset. However, they do not produce any meaningful clustering even for n as large as 10^6." This doesn't provide hardware specs for their actual experiments.
Software Dependencies No The paper mentions using "TensorFlow" and "PyTorch" in the introduction as widely-used libraries, but these are general mentions and not specific dependencies with version numbers for their implementation. No specific ancillary software details with version numbers were provided for their experiments.
Experiment Setup Yes Implementation Details. We modify our algorithm in several places to make it more practical. First, instead of using nets we use locality-sensitive hashing (LSH)... Details on the choice of parameters can be found in Appendix D.