Statistical Guarantees for Consensus Clustering
Authors: Zhixin Zhou, Gautam Dudeja, Arash A Amini
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments show the effectiveness of the proposed methods. |
| Researcher Affiliation | Academia | Zhixin Zhou1, Gautam Dudeja2, Arash A. Amini2 1City University of Hong Kong, 2University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1 Basic label aggregation algorithm. Algorithm 2 Spectral label aggregation algorithm. Algorithm 3 Local Refinement |
| Open Source Code | No | The paper does not include an explicit statement or link for the release of its own source code. |
| Open Datasets | No | The paper states: 'The ground truth label matrix Z is generated by randomly assigning each of the n objects to one of the K labels. The N input clusterings Zj, j [N] are generated from model (8).' This describes synthetic data generation, not the use of a publicly available dataset with concrete access information. |
| Dataset Splits | No | The paper describes generating synthetic data for experiments but does not provide specific training, validation, or test dataset splits. It mentions 'results are averaged over 40 replications' but not data partitioning for model training/evaluation. |
| Hardware Specification | No | The paper does not mention any specific hardware used for running the experiments (e.g., specific GPU/CPU models, cloud instances). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers used for the experiments. |
| Experiment Setup | Yes | The ground truth label matrix Z is generated by randomly assigning each of the n objects to one of the K labels. The N input clusterings Zj, j [N] are generated from model (8). We measure the performance of an algorithm by the adjusted Rand index (ARI) of its output against the ground truth. ... The settings in Figure 2 all correspond to balanced cluster sizes. Generally, our proposed Basic and SC algorithms outperform the EM, KCC, CCPivot and BOEM algorithms, with the failure thresholds occurring at larger values of p (harder problems). ... (a) n = 100, N = 20, K = 6 ... (b) n = 100, N = 200, K = 6 ... (a) n = 100, N = 20, K = 6, p1 = 0.5 |