reproducibilityindex.ai

Exploiting k-Degree Locality to Improve Overlapping Community Detection

Authors: Hongyi Zhang, Michael R. Lyu, Irwin King

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our LNMF model with several baseline methods on various real-world networks, including large ones with ground-truth communities. Results show that our model outperforms state-of-the-art approaches.
Researcher Affiliation	Academia	1Shenzhen Key Laboratory of Rich Media Big Data Analytics and Applications, Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China 2Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Pseudocode	Yes	Algorithm 1 Community Detection via LNMF; Algorithm 2 Sampling Strategy
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that the code is publicly available.
Open Datasets	Yes	Six benchmark networks collected by Newman1 are used as our datasets. [...] 1http://www-personal.umich.edu/~mejn/netdata/; Moreover, we choose three large networks with groundtruth communities collected by SNAP2 [Yang and Leskovec, 2012] to test the scalability of our model. [...] 2http://snap.stanford.edu/data/
Dataset Splits	Yes	In details, we reserve 10% of nodes as validation set at ﬁrst.
Hardware Specification	Yes	We conduct our experiments on a computer with a Xeon 2.60GHz CPU and 64GB memory.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	We set the regularization coefﬁcient to be 0.5 and the convergence parameter ϵ to be 0.001 for all experiments. The sample size t is determined according to data size. For Newman s datasets, we set t = m, i.e., the number of links. For SNAP datasets, we set t = 10 n in order to ﬁnish one iteration without taking too much time, where n is the number of nodes. The maximum times of iteration is set to 100, though in fact all datasets converge before reaching the limit.