Near-Optimal $k$-Clustering in the Sliding Window Model

Authors: David Woodruff, Peilin Zhong, Samson Zhou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct simple empirical demonstrations as proof-of-concepts to illustrate the benefits of our algorithm. Our empirical evaluations were conducted using Python 3.10 using a 64-bit operating system on an AMD Ryzen 7 5700U CPU, with 8GB RAM and 8 cores with base clock 1.80 GHz.
Researcher Affiliation Collaboration David P. Woodruff CMU dwoodruf@cs.cmu.edu Peilin Zhong Google Research peilinz@google.com Samson Zhou Texas A&M University samsonzhou@gmail.com
Pseudocode Yes Algorithm 1 RINGSAMPLE Algorithm 2 Merge-and-reduce framework for randomized algorithms in the sliding window model, using randomized constructions of online coresets
Open Source Code No The paper does not provide any statements about releasing code for the described methodology or links to a code repository.
Open Datasets Yes The first component of our dataset consists of the points of the SKIN (Skin Segmentation) dataset X from the publicly available UCI repository [6], which was also used in the experiments of [8].
Dataset Splits No The paper describes aspects of the experimental setup such as iterations and initialization methods, and states the ranges of m and k values tested. However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages or absolute counts).
Hardware Specification Yes Our empirical evaluations were conducted using Python 3.10 using a 64-bit operating system on an AMD Ryzen 7 5700U CPU, with 8GB RAM and 8 cores with base clock 1.80 GHz.
Software Dependencies No The paper mentions 'Python 3.10', but it does not list multiple key software components with their specific version numbers or a self-contained solver with a version number, which is required for a reproducible description.
Experiment Setup Yes For each of the instances of Lloyd s algorithm, either on the entire dataset X or the sampled coreset C, we use 10 iterations using the k-means++ initialization.