reproducibilityindex.ai

Coresets for Clustering in Graphs of Bounded Treewidth

Authors: Daniel Baker, Vladimir Braverman, Lingxiao Huang, Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement our algorithm and evaluate its performance on real-world road networks. Our implementation generally follows the importance sampling algorithm as in Section 3.
Researcher Affiliation	Academia	1Johns Hopkins University, USA 2Yale University, USA 3Weizmann Institute of Science, Israel.
Pseudocode	Yes	Algorithm 1 ITERATEDTHORUPSAMPLING; Algorithm 2 THOSAMPLEBEST
Open Source Code	Yes	Our software is open source and freely available, and implemented in C++17.
Open Datasets	Yes	Throughout the experiments the graph G is a road network of New York State extracted from Open Street Map (Open Street Map contributors, 2020) and clipped by bounding box to enclose New York City (NYC).
Dataset Splits	No	The paper describes how data points X are chosen and evaluated (e.g., Xuni, Xman) and mentions evaluating empirical error. However, it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts for a model training process, as this is a clustering coreset rather than a supervised learning task.
Hardware Specification	Yes	All experiments were performed on a Lenovo x3850 X6 system with 4 2.0 GHz Intel E7-4830 CPUs, each with 14 processor cores with hyperthreading enabled. The system had 1 TB of RAM.
Software Dependencies	No	The paper states, "Our software is open source and freely available, and implemented in C++17." While C++17 is a specific version, it does not list additional key software components or libraries with their specific version numbers that would be needed for replication.
Experiment Setup	Yes	We deﬁne the empirical error of a coreset D and a center set C V as err(D, C) := cost(D,C) cost(X,C) 1 (corresponding to ϵ in the deﬁnition of a coreset). Since by deﬁnition a coreset preserves the objective for all center sets, we evaluate the empirical error by randomly picking 2000 center sets C V from V , and reporting the maximum empirical error err(D, C) over all these C. For the sake of evaluation, we compare the maximum empirical error of our coreset with a baseline of a uniform sample, where points are drawn uniformly at random from X and assigned equal weight (that sums to \|X\|). To reduce the variance introduced by the randomness in the coreset construction, we repeat each construction 10 times and report the average of their maximum empirical error.