A General Clustering Agreement Index: For Comparing Disjoint and Overlapping Clusters
Authors: Reihaneh Rabbany, Osmar Zaane
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical comparison of four well-known overlapping community detection methods, obtained based on the previous and the newly proposed agreement indexes. In more detail, we use the overlapping LFR benchmark generators (Lancichinetti, Fortunato, and Kertesz 2008), to synthesize benchmarks with varying fraction of overlapping nodes (10 realizations for each setting to report the average). |
| Researcher Affiliation | Academia | Reihaneh Rabbany, Osmar R. Za ıane Department of Computing Science, University of Alberta Edmonton, AB, Canada {rabbanyk, zaiane}@ualberta.ca |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. |
| Open Datasets | No | The paper mentions using 'overlapping LFR benchmark generators (Lancichinetti, Fortunato, and Kertesz 2008)' to synthesize datasets. While it cites the generator, it does not provide concrete access information (e.g., URL, DOI, or specific parameters for exact reproduction) for the specific datasets generated and used in the experiments described in this paper. |
| Dataset Splits | No | The paper does not provide specific dataset split information (e.g., exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | The experimental settings are as follows. First, we generate a set of benchmark datasets using a generator which synthesizes networks with built-in ground-truth communities. Then, these datasets are clustered with different community detection algorithms. Finally, the results obtained from different algorithms are compared against the ground-truth in these benchmarks, using a clustering agreement index. In more detail, we use the overlapping LFR benchmark generators (Lancichinetti, Fortunato, and Kertesz 2008), to synthesize benchmarks with varying fraction of overlapping nodes (10 realizations for each setting to report the average). The overlapping community detection methods included in this comparison are: COPRA (Gregory 2010), MOSES (Mc Daid and Hurley 2010), OSLOM (Lancichinetti et al. 2011), and BIGCLAM (Yang and Leskovec 2013). We apply the overlapping extensions of NMI, i.e., NMI by Lancichinetti, Fortunato, and Kertesz (2008) and NMI by Mc Daid, Greene, and Hurley (2011); the adjusted omega index (Aω) by Collins and Dent (1988); the δ-based formulations for the ARI by Rabbany and Za ıane (2015), to compare them against the ARI and NMI overlapping extensions presented in this paper, i.e., CRI and CMI derived from our CAI generalization. |