Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Revenue Function for Comparison-Based Hierarchical Clustering

Authors: Aishik Mandal, Michaël Perrot, Debarghya Ghoshdastidar

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the practical side, we present principled algorithms for comparison-based hierarchical clustering based on the maximisation of the revenue and we empirically compare them with existing methods. In this section, we propose two sets of experiments to demonstrate the practical relevance of our new revenue function and the corresponding algorithm. We consider a planted model and demonstrate the alignment between AARI scores, a supervised metric of goodness for clustering, and our proposed revenue function.
Researcher Affiliation Academia Aishik Mandal EMAIL Centre of Excellence in Artificial Intelligence Indian Institute of Technology Kharagpur Michaël Perrot EMAIL Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRISt AL, F-59000 Lille, France Debarghya Ghoshdastidar EMAIL Technical University of Munich School of Computation, Information and Technology Munich Data Science Institute
Pseudocode No The paper describes algorithms (Add S3-AL, Add S4-AL) as a procedure with 'Step 1' and 'Step 2' in Section 6. However, it does not present them in a structured pseudocode block with typical keywords or formatting (e.g., 'if', 'for', numbered lines, indentation) commonly associated with pseudocode or algorithm blocks. The description is prose-like rather than code-like.
Open Source Code Yes The code is available at https://github.com/jitaishik/Revenue_Comparison HC.git
Open Datasets Yes On the one hand, we consider 3 standard clustering datasets: Zoo, Glass, and MNIST (Heller and Ghahramani, 2005; Le Cun et al., 2010; Vikram and Dasgupta, 2016). On the other hand, we consider 5 comparison-based datasets, Car, Food, Vogue Cover, Nature Scene and Image Net Images v0.1, from the cblearn repository.
Dataset Splits No The paper describes how synthetic data is generated and how comparisons are sampled (e.g., 'uniformly sampling kn2 comparisons', 'randomly sampled 200 examples for each digit for MNIST'). It also mentions a 5% flip rate for comparisons. However, it does not explicitly provide traditional training/testing/validation splits for a machine learning model, which is the primary focus of the question. The details provided relate to data generation and sampling of comparisons, not standard dataset partitioning for model evaluation.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments. No specific GPU or CPU models, memory amounts, or detailed computing environments are mentioned.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers. It does not mention any libraries, frameworks, or solvers along with their exact versions.
Experiment Setup Yes To generate the data in this first set of experiments, we use a standard planted model... In all the experiments, we set µ = 0.8, σ = 0.1, n0 = 30, L = 3 and we vary δ {0.02, 0.04, ..., 0.2}. Since we are in a comparison-based setting, we do not directly use the similarities of the planted model to learn dendrograms but instead generate comparisons. Given Tall and Qall the sets containing all possible triplets and quadruplets (see preliminaries), we obtain T Tall and Q Qall by uniformly sampling kn2 comparisons with k > 0. To model mistakes from human annotators, we randomly and uniformly flip 5% of the comparisons (Emamjomeh-Zadeh and Kempe, 2018), where by flipping (i, j, k) we mean replacing it with (i, k, j).