Fully Dynamic $k$-Clustering in $\tilde O(k)$ Update Time
Authors: Sayan Bhattacharya, Martín Costa, Silvio Lattanzi, Nikos Parotsidis
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical analysis with the first in-depth experimental study for the dynamic k-median problem on general metrics, focusing on comparing our dynamic algorithm to the current state-of-the-art by Henzinger and Kale [20]. ... We supplement the above theorem with experiments that compare our algorithm with that of [20]. To the best of our knowledge, this is the first work in the dynamic clustering literature with a detailed experimental evaluation of dynamic k-median algorithms for general metrics. |
| Researcher Affiliation | Collaboration | Sayan Bhattacharya University of Warwick s.bhattacharya@warwick.ac.uk Martin Costa University of Warwick martin.costa@warwick.ac.uk Silvio Lattanzi Google Research silviol@google.com Nikos Parotsidis Google Research nikosp@google.com |
| Pseudocode | Yes | Algorithm 1 Static Algo(U), Algorithm 2 Almost Cover(U), Algorithm 3 Preprocess(U), Algorithm 4 Construct From Layer(i), Algorithm 5 Insert(x), Algorithm 6 Delete(x), Algorithm 7 Rebuild |
| Open Source Code | Yes | All of our code is written in Java and is available online.4 [footnote 4: https://github.com/martin-costa/NeurIPS23-dynamic-k-clustering] |
| Open Datasets | Yes | Datasets. We conduct the empirical evaluation of our algorithm on five datasets from the UCI repository [17]: (1) Census [23] with 2, 458, 285 points of dimension 68, (2) KDD-Cup [32] containing 311, 029 points of dimension 74, (3) Song [5] with 515, 345 points of dimension 90, (4) Drift [33, 30] with 13, 910 points of dimension 129, and (5) SIFT10M [17] with 11, 164, 866 points of dimension 128. |
| Dataset Splits | No | The paper describes a sliding window approach for generating dynamic instances but does not specify a traditional train/validation/test dataset split. |
| Hardware Specification | Yes | We used a machine with 8 cores, a 2.9Ghz processor, and 16 Gi B of main memory. |
| Software Dependencies | No | The paper states 'All of our code is written in Java' but does not specify the version of Java or any other software dependencies with their version numbers. |
| Experiment Setup | Yes | OURALG has a parameter ϕ which we set to be the number of points sampled during the construction of a layer in Algorithm 2. ... We fix the constants ϵ and β used by our algorithm (see Section 3) to 0.2 and 0.5 respectively. HK has a parameter ψ which we set to be the number of points sampled during the static coreset construction from [9]... For each of ϕ and ψ, we experimented with the values 250, 500, 1000. ... We compare OURALG(ϕ = 500) against HK(ψ = 1000). |