Improving Ultrametrics Embeddings Through Coresets

Authors: Vincent Cohen-Addad, Rémi De Joannis De Verclos, Guillaume Lagarde

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We performed experiments to compare our implementation of Algorithm 1 using core-sets to standard agglomerative clustering algorithms (Ward, Single, Centroid). Our implementation is coded using the Cython extension for Python and relies on the C++ library miniball based on the algorithm in (Fischer et al., 2003) through its Cython binding cyminiball to compute MEBs.
Researcher Affiliation Collaboration 1Google Research, Zurich 2remi.de.joannis.de.verclos@ens-lyon.org 3La BRI, CNRS. Correspondence to: Guillaume Lagarde <guillaume.lagarde@labri.fr>.
Pseudocode Yes Algorithm 1 γ δ-approximation for BUF
Open Source Code No The paper does not provide any explicit statements about releasing code or links to a code repository for the described methodology.
Open Datasets Yes The running time and distortion on four classic datasets (IRIS, MICE, PENDIGITS, SHUTTLE, see Table 1 for a complete description, all datasets are from the UCI ML repository (Dua & Graff, 2017)) are reported on Table 2.
Dataset Splits No The paper mentions using specific datasets (IRIS, MICE, PENDIGITS, SHUTTLE) but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification Yes The test have been made on a laptop with 8GB of memory and a processor Intel i5-8265U with frequency 1.60GHz.
Software Dependencies No The paper mentions using 'Cython extension for Python', 'C++ library miniball', 'cyminiball', 'Scikit-learn library', and 'fastcluster library', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes Core Set is our algorithm, using the parameter ε = 0.2 for core-sets.