Improving Ultrametrics Embeddings Through Coresets
Authors: Vincent Cohen-Addad, Rémi De Joannis De Verclos, Guillaume Lagarde
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We performed experiments to compare our implementation of Algorithm 1 using core-sets to standard agglomerative clustering algorithms (Ward, Single, Centroid). Our implementation is coded using the Cython extension for Python and relies on the C++ library miniball based on the algorithm in (Fischer et al., 2003) through its Cython binding cyminiball to compute MEBs. |
| Researcher Affiliation | Collaboration | 1Google Research, Zurich 2remi.de.joannis.de.verclos@ens-lyon.org 3La BRI, CNRS. Correspondence to: Guillaume Lagarde <guillaume.lagarde@labri.fr>. |
| Pseudocode | Yes | Algorithm 1 γ δ-approximation for BUF |
| Open Source Code | No | The paper does not provide any explicit statements about releasing code or links to a code repository for the described methodology. |
| Open Datasets | Yes | The running time and distortion on four classic datasets (IRIS, MICE, PENDIGITS, SHUTTLE, see Table 1 for a complete description, all datasets are from the UCI ML repository (Dua & Graff, 2017)) are reported on Table 2. |
| Dataset Splits | No | The paper mentions using specific datasets (IRIS, MICE, PENDIGITS, SHUTTLE) but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | Yes | The test have been made on a laptop with 8GB of memory and a processor Intel i5-8265U with frequency 1.60GHz. |
| Software Dependencies | No | The paper mentions using 'Cython extension for Python', 'C++ library miniball', 'cyminiball', 'Scikit-learn library', and 'fastcluster library', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | Core Set is our algorithm, using the parameter ε = 0.2 for core-sets. |