Statistical Guarantees for Consensus Clustering

Authors: Zhixin Zhou, Gautam Dudeja, Arash A Amini

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments show the effectiveness of the proposed methods.
Researcher Affiliation Academia Zhixin Zhou1, Gautam Dudeja2, Arash A. Amini2 1City University of Hong Kong, 2University of California, Los Angeles
Pseudocode Yes Algorithm 1 Basic label aggregation algorithm. Algorithm 2 Spectral label aggregation algorithm. Algorithm 3 Local Refinement
Open Source Code No The paper does not include an explicit statement or link for the release of its own source code.
Open Datasets No The paper states: 'The ground truth label matrix Z is generated by randomly assigning each of the n objects to one of the K labels. The N input clusterings Zj, j [N] are generated from model (8).' This describes synthetic data generation, not the use of a publicly available dataset with concrete access information.
Dataset Splits No The paper describes generating synthetic data for experiments but does not provide specific training, validation, or test dataset splits. It mentions 'results are averaged over 40 replications' but not data partitioning for model training/evaluation.
Hardware Specification No The paper does not mention any specific hardware used for running the experiments (e.g., specific GPU/CPU models, cloud instances).
Software Dependencies No The paper does not list specific software dependencies with version numbers used for the experiments.
Experiment Setup Yes The ground truth label matrix Z is generated by randomly assigning each of the n objects to one of the K labels. The N input clusterings Zj, j [N] are generated from model (8). We measure the performance of an algorithm by the adjusted Rand index (ARI) of its output against the ground truth. ... The settings in Figure 2 all correspond to balanced cluster sizes. Generally, our proposed Basic and SC algorithms outperform the EM, KCC, CCPivot and BOEM algorithms, with the failure thresholds occurring at larger values of p (harder problems). ... (a) n = 100, N = 20, K = 6 ... (b) n = 100, N = 200, K = 6 ... (a) n = 100, N = 20, K = 6, p1 = 0.5