Coresets for Ordered Weighted Clustering
Authors: Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our algorithm on a real geographical data set, and we find our coreset leads to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights. ... We evaluate our coreset algorithm experimentally on real 2D geographical data. Our data set is the whole Hong Kong region extracted from Open Street Map (Open Street Map contributors, 2017), with complex objects such as roads replaced with their geometric means. The data set consists of about 1.5 million 2D points and is illustrated in Figure 2. |
| Researcher Affiliation | Academia | 1Johns Hopkins University, USA. 2Weizmann Institute of Science, Israel. |
| Pseudocode | No | The paper describes the algorithms and constructions in prose and mathematical notation but does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access (link or explicit statement) to the source code for the methodology described. |
| Open Datasets | Yes | Our data set is the whole Hong Kong region extracted from Open Street Map (Open Street Map contributors, 2017)... Map data copyrighted by Open Street Map contributors and is available from https://www.openstreetmap.org. |
| Dataset Splits | No | The paper mentions evaluating empirical error by sampling random centers, but it does not specify explicit dataset splits (e.g., train/validation/test percentages or counts) needed to reproduce data partitioning for model training or evaluation in a structured way. |
| Hardware Specification | Yes | All our experiments were conducted on a laptop computer with an Intel 4-core 2.8 GHz CPU and 64 GB memory. |
| Software Dependencies | No | The algorithms are written in Java programming language and are implemented single threaded. No specific library names with version numbers are provided. |
| Experiment Setup | Yes | To examine the performance of our coreset algorithm for p-CENTRUM (using the heuristic for the initial centers), we execute it with parameters p = 0.1n and k = 2, and let the error guarantee ϵ vary... We experiment with the different parameters for coresets of p-CENTRUM, and we find out that the empirical error is always far lower than our error guarantee ϵ. |