Community Detection in Attributed Graphs: An Embedding Approach

Authors: Ye Li, Chaofeng Sha, Xin Huang, Yanchun Zhang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on 19 attributed graph datasets with overlapping and non-overlapping ground-truth communities show that our proposed model CDE can accurately identify attributed communities and significantly outperform 7 stateof-the-art methods.
Researcher Affiliation Academia 1Shanghai Key Laboratory of Intelligence Processing, School of Computer Science, Fudan University 2Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University 3Department of Computer Science, Hong Kong Baptist University 4HKBU Shenzhen Institute of Research and Continuing Education 5Centre for Applied Informatics, College of Engineering and Science, Victoria University
Pseudocode No The paper describes mathematical equations and updating rules but does not present them in a structured pseudocode block or an algorithm box.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes For Non-Overlapping Ground-truth Communities, we use 6 datasets of Citeseer, Cora, Cornell, Texas, Washington, and Wisconsin, which are available at the website1. When it comes to Overlapping Ground-truth Communities, we use graph datasets from 3 different domains, Philosophers network (Ahn, Bagrow, and Lehmann 2010), Flickr (Ruan, Fuhry, and Parthasarathy 2013), and Facebook. 1http://linqs.cs.umd.edu/projects/projects/lbc/
Dataset Splits No The paper does not explicitly specify dataset splits (e.g., percentages or sample counts) for training, validation, or testing. It states it uses datasets with ground-truth communities for evaluation.
Hardware Specification Yes all experiments are conducted on a Windows Servers with Xeon 64-core CPU (2.70 GHz) and 128G main memory.
Software Dependencies No Our algorithms are implemented in Matlab and C++. No specific version numbers for Matlab, C++, or any libraries are provided.
Experiment Setup Yes CDE has three parameters: α is a nonnegative constant that controls the sparsity of community-attribute matrix C, β is a positive constant to balance the contributions of node attributes and community structure embedding, and κ determines the number of negative samples for community structure embedding method. We first set α = β = 1 to treat node attributes and community structure embedding with the same importance. Then, we test CDE on Wisconsin by varying parameter κ from 1 to 100. More specifically, CDE achieves the maximum scores of AC = 0.6645 and NMI = 0.409 when κ = 25, which suggests that a suitable number of sampling nodes could improve the performance of CDE. In terms of the parameters α and β, we set κ = 25 and vary α and β from 1 to 50 respectively. Figure 3 shows the corresponding results on Wisconsin dataset, in which CDE achieves maximum scores of AC = 0.7321 and NMI = 0.4284 when α = 1 and β = 2. Those results indicate our community structure embedding method could extract essential structural information in separating different communities from the original network topology, and further demonstrate the superiority of our community structure embedding method in encoding inherent community structures.