reproducibilityindex.ai

Approximately Pareto-optimal Solutions for Bi-Objective k-Clustering

Authors: Anna Arutyunova, Jan Eube, Heiko Röglin, Melanie Schmidt, Sarah Sturm, Julian Wargalla

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	we demonstrate in several experiments that the approximate Pareto front contains good clusterings that cannot be found by considering one of the objectives separately. We also demonstrate experimentally the usefulness of approximate Pareto fronts and our algorithms in different applications.
Researcher Affiliation	Academia	Anna Arutyunova Heinrich Heine University Düsseldorf Düsseldorf, Germany Jan Eube University of Bonn Bonn, Germany Heiko Röglin University of Bonn Bonn, Germany Melanie Schmidt Heinrich Heine University Düsseldorf Düsseldorf, Germany Sarah Sturm University of Bonn Bonn, Germany Julian Wargalla Heinrich Heine University Düsseldorf Düsseldorf, Germany
Pseudocode	Yes	Algorithm 1: Approximate solution for k-diameter and k-separation
Open Source Code	Yes	The source code can be found at https://github.com/algo-hhu/pareto Clustering.
Open Datasets	Yes	All data sets were downloaded from freely available sources, see Table 4 for information on the data sets.
Dataset Splits	No	The paper states 'We fix the number k of clusters to be the desired number of clusters in the ground truth.' and evaluates against ground truth, but does not specify explicit training, validation, or test splits, or how the data was partitioned for these purposes.
Hardware Specification	Yes	The experiments were performed on a machine with a 1.8 GHz Intel Core i7-8565U processor and 16 GB RAM.
Software Dependencies	Yes	The code was compiled using g++ version 11.4.0 with optimization flag -O2.
Experiment Setup	Yes	For every data set, we compute the approximate Pareto set for the desired number of clusters as follows. Recall that for a data set of size n there are only O(n2) possible values for the separation. We compute these values in time O(n2d) and sort them in increasing order in time O(n2 log(n)).... Starting with separation 0, we increase the separation in every step to the next largest value. Suppose the separation is in the current step, then we merge all points whose distance is at most . This can be done efficiently via a Union Find data structure. Since the resulting clustering may have more than k clusters, we have to reduce the number of clusters to k. ...we use k-means++ [8] to cluster the centroids...