reproducibilityindex.ai

Statistical Guarantees for Consensus Clustering

Authors: Zhixin Zhou, Gautam Dudeja, Arash A Amini

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments show the effectiveness of the proposed methods.
Researcher Affiliation	Academia	Zhixin Zhou1, Gautam Dudeja2, Arash A. Amini2 1City University of Hong Kong, 2University of California, Los Angeles
Pseudocode	Yes	Algorithm 1 Basic label aggregation algorithm. Algorithm 2 Spectral label aggregation algorithm. Algorithm 3 Local Refinement
Open Source Code	No	The paper does not include an explicit statement or link for the release of its own source code.
Open Datasets	No	The paper states: 'The ground truth label matrix Z is generated by randomly assigning each of the n objects to one of the K labels. The N input clusterings Zj, j [N] are generated from model (8).' This describes synthetic data generation, not the use of a publicly available dataset with concrete access information.
Dataset Splits	No	The paper describes generating synthetic data for experiments but does not provide specific training, validation, or test dataset splits. It mentions 'results are averaged over 40 replications' but not data partitioning for model training/evaluation.
Hardware Specification	No	The paper does not mention any specific hardware used for running the experiments (e.g., specific GPU/CPU models, cloud instances).
Software Dependencies	No	The paper does not list specific software dependencies with version numbers used for the experiments.
Experiment Setup	Yes	The ground truth label matrix Z is generated by randomly assigning each of the n objects to one of the K labels. The N input clusterings Zj, j [N] are generated from model (8). We measure the performance of an algorithm by the adjusted Rand index (ARI) of its output against the ground truth. ... The settings in Figure 2 all correspond to balanced cluster sizes. Generally, our proposed Basic and SC algorithms outperform the EM, KCC, CCPivot and BOEM algorithms, with the failure thresholds occurring at larger values of p (harder problems). ... (a) n = 100, N = 20, K = 6 ... (b) n = 100, N = 200, K = 6 ... (a) n = 100, N = 20, K = 6, p1 = 0.5