Approximately Pareto-optimal Solutions for Bi-Objective k-Clustering
Authors: Anna Arutyunova, Jan Eube, Heiko Röglin, Melanie Schmidt, Sarah Sturm, Julian Wargalla
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | we demonstrate in several experiments that the approximate Pareto front contains good clusterings that cannot be found by considering one of the objectives separately. We also demonstrate experimentally the usefulness of approximate Pareto fronts and our algorithms in different applications. |
| Researcher Affiliation | Academia | Anna Arutyunova Heinrich Heine University Düsseldorf Düsseldorf, Germany Jan Eube University of Bonn Bonn, Germany Heiko Röglin University of Bonn Bonn, Germany Melanie Schmidt Heinrich Heine University Düsseldorf Düsseldorf, Germany Sarah Sturm University of Bonn Bonn, Germany Julian Wargalla Heinrich Heine University Düsseldorf Düsseldorf, Germany |
| Pseudocode | Yes | Algorithm 1: Approximate solution for k-diameter and k-separation |
| Open Source Code | Yes | The source code can be found at https://github.com/algo-hhu/pareto Clustering. |
| Open Datasets | Yes | All data sets were downloaded from freely available sources, see Table 4 for information on the data sets. |
| Dataset Splits | No | The paper states 'We fix the number k of clusters to be the desired number of clusters in the ground truth.' and evaluates against ground truth, but does not specify explicit training, validation, or test splits, or how the data was partitioned for these purposes. |
| Hardware Specification | Yes | The experiments were performed on a machine with a 1.8 GHz Intel Core i7-8565U processor and 16 GB RAM. |
| Software Dependencies | Yes | The code was compiled using g++ version 11.4.0 with optimization flag -O2. |
| Experiment Setup | Yes | For every data set, we compute the approximate Pareto set for the desired number of clusters as follows. Recall that for a data set of size n there are only O(n2) possible values for the separation. We compute these values in time O(n2d) and sort them in increasing order in time O(n2 log(n)).... Starting with separation 0, we increase the separation in every step to the next largest value. Suppose the separation is in the current step, then we merge all points whose distance is at most . This can be done efficiently via a Union Find data structure. Since the resulting clustering may have more than k clusters, we have to reduce the number of clusters to k. ...we use k-means++ [8] to cluster the centroids... |