Near-Optimal $k$-Clustering in the Sliding Window Model
Authors: David Woodruff, Peilin Zhong, Samson Zhou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct simple empirical demonstrations as proof-of-concepts to illustrate the benefits of our algorithm. Our empirical evaluations were conducted using Python 3.10 using a 64-bit operating system on an AMD Ryzen 7 5700U CPU, with 8GB RAM and 8 cores with base clock 1.80 GHz. |
| Researcher Affiliation | Collaboration | David P. Woodruff CMU dwoodruf@cs.cmu.edu Peilin Zhong Google Research peilinz@google.com Samson Zhou Texas A&M University samsonzhou@gmail.com |
| Pseudocode | Yes | Algorithm 1 RINGSAMPLE Algorithm 2 Merge-and-reduce framework for randomized algorithms in the sliding window model, using randomized constructions of online coresets |
| Open Source Code | No | The paper does not provide any statements about releasing code for the described methodology or links to a code repository. |
| Open Datasets | Yes | The first component of our dataset consists of the points of the SKIN (Skin Segmentation) dataset X from the publicly available UCI repository [6], which was also used in the experiments of [8]. |
| Dataset Splits | No | The paper describes aspects of the experimental setup such as iterations and initialization methods, and states the ranges of m and k values tested. However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages or absolute counts). |
| Hardware Specification | Yes | Our empirical evaluations were conducted using Python 3.10 using a 64-bit operating system on an AMD Ryzen 7 5700U CPU, with 8GB RAM and 8 cores with base clock 1.80 GHz. |
| Software Dependencies | No | The paper mentions 'Python 3.10', but it does not list multiple key software components with their specific version numbers or a self-contained solver with a version number, which is required for a reproducible description. |
| Experiment Setup | Yes | For each of the instances of Lloyd s algorithm, either on the entire dataset X or the sampled coreset C, we use 10 iterations using the k-means++ initialization. |