Differentially Private k-Means with Constant Multiplicative Error

Authors: Uri Stemmer, Haim Kaplan

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We design new differentially private algorithms for the Euclidean k-means problem, both in the centralized model and in the local model of differential privacy. In both models, our algorithms achieve significantly improved error guarantees than the previous state-of-the-art. In addition, in the local model, our algorithm significantly reduces the number of interaction rounds. Although the problem has been widely studied in the context of differential privacy, all of the existing constructions achieve only super constant approximation factors. We present for the first time efficient private algorithms for the problem with constant multiplicative error.
Researcher Affiliation Collaboration Haim Kaplan Tel Aviv University and Google haimk@post.tau.ac.il Uri Stemmer Ben-Gurion University u@uri.co.il
Pseudocode Yes Algorithm Private-k-Means Input: Database S containing n points in the d-dimensional ball B(0, Λ), failure probability β, privacy parameters ε, δ. 1. Initiate C = , and denote S1 = S and n1 = n. 2. For i = 1 to log log n do (a) Run algorithm Private-Centers on the database Si with parameters ε log log n, δ log log n, β k , and add the returned set of centers to C. (b) Let Si+1 Si be a subset of Si containing ni+1 = 2(T + 1)wk n0.1 i points with the largest distance to the centers in C, where w = w(n, d, k, β, ε, δ) and T = T(n) will be specified in the analysis. 3. Output C.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets No The paper is theoretical and describes algorithms with mathematical guarantees, referring to abstract data points in Rd rather than specific public datasets. There is no mention of dataset names, links, or citations for publicly available data used for training.
Dataset Splits No The paper is theoretical and does not conduct empirical experiments, thus no training/validation/test dataset splits are mentioned.
Hardware Specification No The paper does not provide any specific details about the hardware used to run experiments. This is consistent with its theoretical nature.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithm design and mathematical proofs; it does not include details such as hyperparameters, optimization settings, or specific training configurations for an experimental setup.