Near-Optimal Private and Scalable $k$-Clustering

Authors: Vincent Cohen-Addad, Alessandro Epasto, Vahab Mirrokni, Shyam Narayanan, Peilin Zhong

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study the differentially private (DP) k-means and k-median clustering problems of n points in d-dimensional Euclidean space in the massively parallel computation (MPC) model. We provide two near-optimal algorithms where the near-optimality is in three aspects: they both achieve (1). O(1) parallel computation rounds, (2). near-linear in n and polynomial in k total computational work (i.e., near-linear running time when n is a sufficient polynomial in k), (3). O(1) relative approximation and poly(k, d) additive error.
Researcher Affiliation Collaboration Vincent Cohen-Addad1 Alessandro Epasto1 Vahab Mirrokni1 Shyam Narayanan2 Peilin Zhong1 1Google Research 2MIT {cohenaddad,aepasto,mirrokni,peilinz}@google.com shyamsn@mit.edu
Pseudocode Yes We present the algorithm pseudocode in below in Algorithm 1, and defer the full analysis to Appendix C.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There is no mention of a code repository link, an explicit code release statement, or code being available in supplementary materials.
Open Datasets No The paper focuses on theoretical algorithm design and analysis. It does not conduct empirical studies with datasets, and therefore does not provide concrete access information for a publicly available or open dataset.
Dataset Splits No The paper focuses on theoretical algorithm design and analysis. It does not conduct empirical studies, and therefore does not provide specific dataset split information (e.g., training, validation, test percentages or counts).
Hardware Specification No The paper focuses on theoretical algorithm design and analysis. It does not describe any empirical experiments or their setup, and therefore does not provide specific hardware details (like GPU/CPU models or memory amounts) used for running experiments.
Software Dependencies No The paper focuses on theoretical algorithm design and analysis. It does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate experiments.
Experiment Setup No The paper focuses on theoretical algorithm design and analysis. It does not describe any empirical experiments or their setup, and therefore does not provide specific experimental setup details like hyperparameter values or training configurations.