Approximately Pareto-optimal Solutions for Bi-Objective k-Clustering

Authors: Anna Arutyunova, Jan Eube, Heiko Röglin, Melanie Schmidt, Sarah Sturm, Julian Wargalla

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental we demonstrate in several experiments that the approximate Pareto front contains good clusterings that cannot be found by considering one of the objectives separately. We also demonstrate experimentally the usefulness of approximate Pareto fronts and our algorithms in different applications.
Researcher Affiliation Academia Anna Arutyunova Heinrich Heine University Düsseldorf Düsseldorf, Germany Jan Eube University of Bonn Bonn, Germany Heiko Röglin University of Bonn Bonn, Germany Melanie Schmidt Heinrich Heine University Düsseldorf Düsseldorf, Germany Sarah Sturm University of Bonn Bonn, Germany Julian Wargalla Heinrich Heine University Düsseldorf Düsseldorf, Germany
Pseudocode Yes Algorithm 1: Approximate solution for k-diameter and k-separation
Open Source Code Yes The source code can be found at https://github.com/algo-hhu/pareto Clustering.
Open Datasets Yes All data sets were downloaded from freely available sources, see Table 4 for information on the data sets.
Dataset Splits No The paper states 'We fix the number k of clusters to be the desired number of clusters in the ground truth.' and evaluates against ground truth, but does not specify explicit training, validation, or test splits, or how the data was partitioned for these purposes.
Hardware Specification Yes The experiments were performed on a machine with a 1.8 GHz Intel Core i7-8565U processor and 16 GB RAM.
Software Dependencies Yes The code was compiled using g++ version 11.4.0 with optimization flag -O2.
Experiment Setup Yes For every data set, we compute the approximate Pareto set for the desired number of clusters as follows. Recall that for a data set of size n there are only O(n2) possible values for the separation. We compute these values in time O(n2d) and sort them in increasing order in time O(n2 log(n)).... Starting with separation 0, we increase the separation in every step to the next largest value. Suppose the separation is in the current step, then we merge all points whose distance is at most . This can be done efficiently via a Union Find data structure. Since the resulting clustering may have more than k clusters, we have to reduce the number of clusters to k. ...we use k-means++ [8] to cluster the centroids...