Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Near-Optimal $k$-Clustering in the Sliding Window Model
Authors: David Woodruff, Peilin Zhong, Samson Zhou
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct simple empirical demonstrations as proof-of-concepts to illustrate the benefits of our algorithm. Our empirical evaluations were conducted using Python 3.10 using a 64-bit operating system on an AMD Ryzen 7 5700U CPU, with 8GB RAM and 8 cores with base clock 1.80 GHz. |
| Researcher Affiliation | Collaboration | David P. Woodruff CMU EMAIL Peilin Zhong Google Research EMAIL Samson Zhou Texas A&M University EMAIL |
| Pseudocode | Yes | Algorithm 1 RINGSAMPLE Algorithm 2 Merge-and-reduce framework for randomized algorithms in the sliding window model, using randomized constructions of online coresets |
| Open Source Code | No | The paper does not provide any statements about releasing code for the described methodology or links to a code repository. |
| Open Datasets | Yes | The first component of our dataset consists of the points of the SKIN (Skin Segmentation) dataset X from the publicly available UCI repository [6], which was also used in the experiments of [8]. |
| Dataset Splits | No | The paper describes aspects of the experimental setup such as iterations and initialization methods, and states the ranges of m and k values tested. However, it does not specify explicit training, validation, or test dataset splits (e.g., percentages or absolute counts). |
| Hardware Specification | Yes | Our empirical evaluations were conducted using Python 3.10 using a 64-bit operating system on an AMD Ryzen 7 5700U CPU, with 8GB RAM and 8 cores with base clock 1.80 GHz. |
| Software Dependencies | No | The paper mentions 'Python 3.10', but it does not list multiple key software components with their specific version numbers or a self-contained solver with a version number, which is required for a reproducible description. |
| Experiment Setup | Yes | For each of the instances of Lloyd s algorithm, either on the entire dataset X or the sampled coreset C, we use 10 iterations using the k-means++ initialization. |