Coresets for Ordered Weighted Clustering

Authors: Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our algorithm on a real geographical data set, and we find our coreset leads to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights. ... We evaluate our coreset algorithm experimentally on real 2D geographical data. Our data set is the whole Hong Kong region extracted from Open Street Map (Open Street Map contributors, 2017), with complex objects such as roads replaced with their geometric means. The data set consists of about 1.5 million 2D points and is illustrated in Figure 2.
Researcher Affiliation Academia 1Johns Hopkins University, USA. 2Weizmann Institute of Science, Israel.
Pseudocode No The paper describes the algorithms and constructions in prose and mathematical notation but does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access (link or explicit statement) to the source code for the methodology described.
Open Datasets Yes Our data set is the whole Hong Kong region extracted from Open Street Map (Open Street Map contributors, 2017)... Map data copyrighted by Open Street Map contributors and is available from https://www.openstreetmap.org.
Dataset Splits No The paper mentions evaluating empirical error by sampling random centers, but it does not specify explicit dataset splits (e.g., train/validation/test percentages or counts) needed to reproduce data partitioning for model training or evaluation in a structured way.
Hardware Specification Yes All our experiments were conducted on a laptop computer with an Intel 4-core 2.8 GHz CPU and 64 GB memory.
Software Dependencies No The algorithms are written in Java programming language and are implemented single threaded. No specific library names with version numbers are provided.
Experiment Setup Yes To examine the performance of our coreset algorithm for p-CENTRUM (using the heuristic for the initial centers), we execute it with parameters p = 0.1n and k = 2, and let the error guarantee ϵ vary... We experiment with the different parameters for coresets of p-CENTRUM, and we find out that the empirical error is always far lower than our error guarantee ϵ.