k-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy
Authors: Chenglin Fan, Ping Li, Xiaoyun Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate the effectiveness of our proposed methods. |
| Researcher Affiliation | Industry | Chenglin Fan, Ping Li, Xiaoyun Li Cognitive Computing Lab Baidu Research 10900 NE 8th St. Bellevue, WA 98004, USA |
| Pseudocode | Yes | Algorithm 1: Local search for k-median clustering (Arya et al., 2004) [...] Algorithm 7: DP-HST local search |
| Open Source Code | No | The paper does not contain any statements or links indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | Discrete Euclidean space. Following previous work, we test k-median clustering on the MNIST hand-written digit dataset (Le Cun et al., 1998) with 10 natural clusters (digit 0 to 9). [...] The metric ρ for this experiment is the (weighted) shortest path distance. To generate a size-n graph, we first randomly split the nodes into 10 clusters. Our construction of graphs follows a similar approach as the synthetic pmedinfo graphs provided by the popular OR-Library (Beasley, 1990). |
| Dataset Splits | No | The paper describes how a 'demand set' D is sampled from a larger universe U, and specifies parameters for algorithms and repetitions for robustness, but it does not detail traditional training, validation, and test splits for the datasets themselves. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., library names, frameworks with version numbers) required to replicate the experiments. |
| Experiment Setup | Yes | For non-DP methods, we set α = 10 3 in Algorithm 1 and the maximum number of iterations as 20. For DP methods, we run the algorithms for T = 20 steps and report the results with ϵ = 1 (comparisons/results with other T and ϵ are similar). We test k {2, 5, 10, 15, 20}. The average cost over T iterations is reported for robustness. All the results are averaged over 10 independent repetitions. For non-DP tasks, we set L = 6. For DP clustering, we use L = 8. |