Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Coresets for Ordered Weighted Clustering
Authors: Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our algorithm on a real geographical data set, and we ļ¬nd our coreset leads to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights. ... We evaluate our coreset algorithm experimentally on real 2D geographical data. Our data set is the whole Hong Kong region extracted from Open Street Map (Open Street Map contributors, 2017), with complex objects such as roads replaced with their geometric means. The data set consists of about 1.5 million 2D points and is illustrated in Figure 2. |
| Researcher Affiliation | Academia | 1Johns Hopkins University, USA. 2Weizmann Institute of Science, Israel. |
| Pseudocode | No | The paper describes the algorithms and constructions in prose and mathematical notation but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access (link or explicit statement) to the source code for the methodology described. |
| Open Datasets | Yes | Our data set is the whole Hong Kong region extracted from Open Street Map (Open Street Map contributors, 2017)... Map data copyrighted by Open Street Map contributors and is available from https://www.openstreetmap.org. |
| Dataset Splits | No | The paper mentions evaluating empirical error by sampling random centers, but it does not specify explicit dataset splits (e.g., train/validation/test percentages or counts) needed to reproduce data partitioning for model training or evaluation in a structured way. |
| Hardware Specification | Yes | All our experiments were conducted on a laptop computer with an Intel 4-core 2.8 GHz CPU and 64 GB memory. |
| Software Dependencies | No | The algorithms are written in Java programming language and are implemented single threaded. No specific library names with version numbers are provided. |
| Experiment Setup | Yes | To examine the performance of our coreset algorithm for p-CENTRUM (using the heuristic for the initial centers), we execute it with parameters p = 0.1n and k = 2, and let the error guarantee ϵ vary... We experiment with the different parameters for coresets of p-CENTRUM, and we ļ¬nd out that the empirical error is always far lower than our error guarantee ϵ. |