A General Clustering Agreement Index: For Comparing Disjoint and Overlapping Clusters

Authors: Reihaneh Rabbany, Osmar Za•ane

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical comparison of four well-known overlapping community detection methods, obtained based on the previous and the newly proposed agreement indexes. In more detail, we use the overlapping LFR benchmark generators (Lancichinetti, Fortunato, and Kertesz 2008), to synthesize benchmarks with varying fraction of overlapping nodes (10 realizations for each setting to report the average).
Researcher Affiliation Academia Reihaneh Rabbany, Osmar R. Za ıane Department of Computing Science, University of Alberta Edmonton, AB, Canada {rabbanyk, zaiane}@ualberta.ca
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets No The paper mentions using 'overlapping LFR benchmark generators (Lancichinetti, Fortunato, and Kertesz 2008)' to synthesize datasets. While it cites the generator, it does not provide concrete access information (e.g., URL, DOI, or specific parameters for exact reproduction) for the specific datasets generated and used in the experiments described in this paper.
Dataset Splits No The paper does not provide specific dataset split information (e.g., exact percentages, sample counts, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The experimental settings are as follows. First, we generate a set of benchmark datasets using a generator which synthesizes networks with built-in ground-truth communities. Then, these datasets are clustered with different community detection algorithms. Finally, the results obtained from different algorithms are compared against the ground-truth in these benchmarks, using a clustering agreement index. In more detail, we use the overlapping LFR benchmark generators (Lancichinetti, Fortunato, and Kertesz 2008), to synthesize benchmarks with varying fraction of overlapping nodes (10 realizations for each setting to report the average). The overlapping community detection methods included in this comparison are: COPRA (Gregory 2010), MOSES (Mc Daid and Hurley 2010), OSLOM (Lancichinetti et al. 2011), and BIGCLAM (Yang and Leskovec 2013). We apply the overlapping extensions of NMI, i.e., NMI by Lancichinetti, Fortunato, and Kertesz (2008) and NMI by Mc Daid, Greene, and Hurley (2011); the adjusted omega index (Aω) by Collins and Dent (1988); the δ-based formulations for the ARI by Rabbany and Za ıane (2015), to compare them against the ARI and NMI overlapping extensions presented in this paper, i.e., CRI and CMI derived from our CAI generalization.