reproducibilityindex.ai

Improving Ultrametrics Embeddings Through Coresets

Authors: Vincent Cohen-Addad, Rémi De Joannis De Verclos, Guillaume Lagarde

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We performed experiments to compare our implementation of Algorithm 1 using core-sets to standard agglomerative clustering algorithms (Ward, Single, Centroid). Our implementation is coded using the Cython extension for Python and relies on the C++ library miniball based on the algorithm in (Fischer et al., 2003) through its Cython binding cyminiball to compute MEBs.
Researcher Affiliation	Collaboration	1Google Research, Zurich 2remi.de.joannis.de.verclos@ens-lyon.org 3La BRI, CNRS. Correspondence to: Guillaume Lagarde <guillaume.lagarde@labri.fr>.
Pseudocode	Yes	Algorithm 1 γ δ-approximation for BUF
Open Source Code	No	The paper does not provide any explicit statements about releasing code or links to a code repository for the described methodology.
Open Datasets	Yes	The running time and distortion on four classic datasets (IRIS, MICE, PENDIGITS, SHUTTLE, see Table 1 for a complete description, all datasets are from the UCI ML repository (Dua & Graff, 2017)) are reported on Table 2.
Dataset Splits	No	The paper mentions using specific datasets (IRIS, MICE, PENDIGITS, SHUTTLE) but does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification	Yes	The test have been made on a laptop with 8GB of memory and a processor Intel i5-8265U with frequency 1.60GHz.
Software Dependencies	No	The paper mentions using 'Cython extension for Python', 'C++ library miniball', 'cyminiball', 'Scikit-learn library', and 'fastcluster library', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	Core Set is our algorithm, using the parameter ε = 0.2 for core-sets.