Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Differentially Private k-Means via Exponential Mechanism and Max Cover
Authors: Huy L. Nguyen, Anamay Chaturvedi, Eric Z Xu9101-9108
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conclude with some experiments and find an improvement over previously implemented work for this problem. [...] We present an experimental comparison between algorithm 12, the private k-means clustering algorithm from Balcan et al. (2017), and the non-private Lloyd s algorithm. [...] Figure 1: Empirical comparison of algorithm 1 and the private k-means clustering algorithm from (Balcan et al. 2017). |
| Researcher Affiliation | Academia | Huy L. Nguy ên , Anamay Chaturvedi , Eric Z Xu Khoury College of Computer Sciences, Northeastern University 440 Huntington Ave Boston, Massachusetts 02115 |
| Pseudocode | Yes | Algorithm 1: Private k-means [...] Algorithm 2: Private grid set cover [...] Algorithm 3: Noisy AVG |
| Open Source Code | Yes | The code used for our experiments is available at https://github. com/Anamay-Chaturvedi/Differentially-private-k-means |
| Open Datasets | Yes | Two datasets are used; a synthetic dataset reproducing the construction in Balcan et al. (2017) and the MNIST training dataset (Lecun et al. 1998). The MNIST training dataset uses the raw pixels; it is comprised of 60,000 points with 784 features each. |
| Dataset Splits | No | The paper mentions using the MNIST training dataset and a synthetic dataset, but it does not specify any training, validation, or test splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only mentions general experimental setup. |
| Software Dependencies | No | The paper mentions using the Laplace mechanism and replacing it with the Noisy AVG routine, but it does not provide specific version numbers for any software dependencies or libraries (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Implementation details: We set ϵ = 1 and δ = n 1.5 for both algorithms. Similar to Balcan et al. (2017), we project to a smaller subspace of dimension log(n)/2 rather than O(log(n)/ϵ2) this does not affect privacy. [...] Lloyd s algorithm was executed with 10 iterations. |