reproducibilityindex.ai

Coresets for Ordered Weighted Clustering

Authors: Vladimir Braverman, Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate our algorithm on a real geographical data set, and we ﬁnd our coreset leads to a massive speedup of clustering computations, while maintaining high accuracy for a range of weights. ... We evaluate our coreset algorithm experimentally on real 2D geographical data. Our data set is the whole Hong Kong region extracted from Open Street Map (Open Street Map contributors, 2017), with complex objects such as roads replaced with their geometric means. The data set consists of about 1.5 million 2D points and is illustrated in Figure 2.
Researcher Affiliation	Academia	1Johns Hopkins University, USA. 2Weizmann Institute of Science, Israel.
Pseudocode	No	The paper describes the algorithms and constructions in prose and mathematical notation but does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access (link or explicit statement) to the source code for the methodology described.
Open Datasets	Yes	Our data set is the whole Hong Kong region extracted from Open Street Map (Open Street Map contributors, 2017)... Map data copyrighted by Open Street Map contributors and is available from https://www.openstreetmap.org.
Dataset Splits	No	The paper mentions evaluating empirical error by sampling random centers, but it does not specify explicit dataset splits (e.g., train/validation/test percentages or counts) needed to reproduce data partitioning for model training or evaluation in a structured way.
Hardware Specification	Yes	All our experiments were conducted on a laptop computer with an Intel 4-core 2.8 GHz CPU and 64 GB memory.
Software Dependencies	No	The algorithms are written in Java programming language and are implemented single threaded. No specific library names with version numbers are provided.
Experiment Setup	Yes	To examine the performance of our coreset algorithm for p-CENTRUM (using the heuristic for the initial centers), we execute it with parameters p = 0.1n and k = 2, and let the error guarantee ϵ vary... We experiment with the different parameters for coresets of p-CENTRUM, and we ﬁnd out that the empirical error is always far lower than our error guarantee ϵ.