Semantic Community Identification in Large Attribute Networks

Authors: Xiao Wang, Di Jin, Xiaochun Cao, Liang Yang, Weixiong Zhang

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on synthetic and real-world networks not only show the superior performance of the new method over the state-of-the-art approaches, but also demonstrate its ability to semantically annotate the communities.
Researcher Affiliation Academia 1School of Computer Science and Technology, Tianjin University, Tianjin 300072, China 2State Key Laboratory of Information Security, IIE, Chinese Academy of Sciences, Beijing 100093, China 3School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China 4College of Math and Computer Science, Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China 5Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
Pseudocode No The paper describes iterative updating rules with mathematical formulas but does not present them in structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an unambiguous statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets Yes Synthetic network We first evaluated SCI on a synthetic network constructed using the widely adopted Newman s model (Girvan and Newman 2002). The Citeseer network1 (6 communities) consists of 3312 scientific publications with 4732 edges, and the Cora1 network (7 communities) consists of 2708 scientific publications with 5429 edges. The Web KB network1 consists of 4 subnetworks gathered from 4 universities (Cornell, Texas, Washington and Wisconsin). Each subnetwork is divided into 5 communities. There are 877 webpages with 1608 edges. Each webpage is annotated by 1703-dimensional binary-valued word attributes. 1http://linqs.cs.umd.edu/projects/projects/lbc/ Here we used LASTFM dataset2 from an online music system Last.fm, whose 1892 users are connected in a social network generated from Last.fm friend relations. 2http://ir.ii.uam.es/hetrec2011/datasets.html
Dataset Splits No The paper describes properties of the synthetic and real-world datasets and mentions 'ground-truth community labels' but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification Yes On a PC with RAM: 8G; CPU: Intel I7; Platform: Matlab , the running times are 0.4509s, 0.1917s, 0.3234s, 0.4571s, 88.6821s and 69.8382s, respectively.
Software Dependencies No The paper mentions 'Platform: Matlab' but does not specify a version number for Matlab or any other software dependencies with their versions.
Experiment Setup Yes Therefore we suggest to set β to either 1 or a value between 10 and 100 and fine tune α {1, 10, 20, ..., 100} so as to achieve a high performance.