Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Semantic Community Identification in Large Attribute Networks

Authors: Xiao Wang, Di Jin, Xiaochun Cao, Liang Yang, Weixiong Zhang

AAAI 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experimental results on synthetic and real-world networks not only show the superior performance of the new method over the state-of-the-art approaches, but also demonstrate its ability to semantically annotate the communities.
Researcher Affiliation	Academia	1School of Computer Science and Technology, Tianjin University, Tianjin 300072, China 2State Key Laboratory of Information Security, IIE, Chinese Academy of Sciences, Beijing 100093, China 3School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China 4College of Math and Computer Science, Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China 5Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA
Pseudocode	No	The paper describes iterative updating rules with mathematical formulas but does not present them in structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not include an unambiguous statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets	Yes	Synthetic network We ﬁrst evaluated SCI on a synthetic network constructed using the widely adopted Newman s model (Girvan and Newman 2002). The Citeseer network1 (6 communities) consists of 3312 scientiﬁc publications with 4732 edges, and the Cora1 network (7 communities) consists of 2708 scientiﬁc publications with 5429 edges. The Web KB network1 consists of 4 subnetworks gathered from 4 universities (Cornell, Texas, Washington and Wisconsin). Each subnetwork is divided into 5 communities. There are 877 webpages with 1608 edges. Each webpage is annotated by 1703-dimensional binary-valued word attributes. 1http://linqs.cs.umd.edu/projects/projects/lbc/ Here we used LASTFM dataset2 from an online music system Last.fm, whose 1892 users are connected in a social network generated from Last.fm friend relations. 2http://ir.ii.uam.es/hetrec2011/datasets.html
Dataset Splits	No	The paper describes properties of the synthetic and real-world datasets and mentions 'ground-truth community labels' but does not specify explicit training/validation/test dataset splits with percentages or sample counts.
Hardware Specification	Yes	On a PC with RAM: 8G; CPU: Intel I7; Platform: Matlab , the running times are 0.4509s, 0.1917s, 0.3234s, 0.4571s, 88.6821s and 69.8382s, respectively.
Software Dependencies	No	The paper mentions 'Platform: Matlab' but does not specify a version number for Matlab or any other software dependencies with their versions.
Experiment Setup	Yes	Therefore we suggest to set β to either 1 or a value between 10 and 100 and ﬁne tune α {1, 10, 20, ..., 100} so as to achieve a high performance.