Coresets for Clustering in Graphs of Bounded Treewidth
Authors: Daniel Baker, Vladimir Braverman, Lingxiao Huang, Shaofeng H.-C. Jiang, Robert Krauthgamer, Xuan Wu
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our algorithm and evaluate its performance on real-world road networks. Our implementation generally follows the importance sampling algorithm as in Section 3. |
| Researcher Affiliation | Academia | 1Johns Hopkins University, USA 2Yale University, USA 3Weizmann Institute of Science, Israel. |
| Pseudocode | Yes | Algorithm 1 ITERATEDTHORUPSAMPLING; Algorithm 2 THOSAMPLEBEST |
| Open Source Code | Yes | Our software is open source and freely available, and implemented in C++17. |
| Open Datasets | Yes | Throughout the experiments the graph G is a road network of New York State extracted from Open Street Map (Open Street Map contributors, 2020) and clipped by bounding box to enclose New York City (NYC). |
| Dataset Splits | No | The paper describes how data points X are chosen and evaluated (e.g., Xuni, Xman) and mentions evaluating empirical error. However, it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts for a model training process, as this is a clustering coreset rather than a supervised learning task. |
| Hardware Specification | Yes | All experiments were performed on a Lenovo x3850 X6 system with 4 2.0 GHz Intel E7-4830 CPUs, each with 14 processor cores with hyperthreading enabled. The system had 1 TB of RAM. |
| Software Dependencies | No | The paper states, "Our software is open source and freely available, and implemented in C++17." While C++17 is a specific version, it does not list additional key software components or libraries with their specific version numbers that would be needed for replication. |
| Experiment Setup | Yes | We define the empirical error of a coreset D and a center set C V as err(D, C) := cost(D,C) cost(X,C) 1 (corresponding to ϵ in the definition of a coreset). Since by definition a coreset preserves the objective for all center sets, we evaluate the empirical error by randomly picking 2000 center sets C V from V , and reporting the maximum empirical error err(D, C) over all these C. For the sake of evaluation, we compare the maximum empirical error of our coreset with a baseline of a uniform sample, where points are drawn uniformly at random from X and assigned equal weight (that sums to |X|). To reduce the variance introduced by the randomness in the coreset construction, we repeat each construction 10 times and report the average of their maximum empirical error. |