reproducibilityindex.ai

Locally Private k-Means in One Round

Authors: Alisa Chang, Badih Ghazi, Ravi Kumar, Pasin Manurangsi

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 5, we present some empirical evaluation of our algorithm.
Researcher Affiliation	Industry	1Google Research, Mountain View, CA. Correspondence to: Alisa Chang <alisac@google.com>, Badih Ghazi <badihghazi@gmail.com>, Ravi Kumar <ravi.k53@gmail.com>, Pasin Manurangsi <pasin@google.com>.
Pseudocode	Yes	Algorithm 1 Building the Net Tree. Algorithm 2 Computing the Threshold. Algorithm 3 Encoding Algorithm for k-means. Algorithm 4 Decoding Algorithm for k-means.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	No	We use mixtures of Gaussians, which are generated as follows. For a separation parameter r and the number k of clusters, ﬁrst pick k centers uniformly at random from the sphere of radius slightly less than one (i.e., 1 (r)). Then, for each center, we create n/k points by adding to the center a Gaussian-distributed vector whose expected norm is 1/r. Finally, we project any point that is outside of the unit ball back into it.
Dataset Splits	No	The paper describes generating its own dataset from mixtures of Gaussians but does not provide explicit details on how this data is split into training, validation, or test sets. No percentages, absolute counts, or predefined split references are given.
Hardware Specification	No	No specific hardware details (like GPU models, CPU models, or memory) were mentioned for the experiments. The text mentions trying "some naive baseline algorithms, such as noising each point using Gaussian/Laplace noise and running k-means++ on this noisy dataset. However, they do not produce any meaningful clustering even for n as large as 10^6." This doesn't provide hardware specs for their actual experiments.
Software Dependencies	No	The paper mentions using "TensorFlow" and "PyTorch" in the introduction as widely-used libraries, but these are general mentions and not specific dependencies with version numbers for their implementation. No specific ancillary software details with version numbers were provided for their experiments.
Experiment Setup	Yes	Implementation Details. We modify our algorithm in several places to make it more practical. First, instead of using nets we use locality-sensitive hashing (LSH)... Details on the choice of parameters can be found in Appendix D.