Exploiting k-Degree Locality to Improve Overlapping Community Detection

Authors: Hongyi Zhang, Michael R. Lyu, Irwin King

IJCAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our LNMF model with several baseline methods on various real-world networks, including large ones with ground-truth communities. Results show that our model outperforms state-of-the-art approaches.
Researcher Affiliation Academia 1Shenzhen Key Laboratory of Rich Media Big Data Analytics and Applications, Shenzhen Research Institute, The Chinese University of Hong Kong, Shenzhen, China 2Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Pseudocode Yes Algorithm 1 Community Detection via LNMF; Algorithm 2 Sampling Strategy
Open Source Code No The paper does not provide any concrete access to source code for the methodology described, nor does it explicitly state that the code is publicly available.
Open Datasets Yes Six benchmark networks collected by Newman1 are used as our datasets. [...] 1http://www-personal.umich.edu/~mejn/netdata/; Moreover, we choose three large networks with groundtruth communities collected by SNAP2 [Yang and Leskovec, 2012] to test the scalability of our model. [...] 2http://snap.stanford.edu/data/
Dataset Splits Yes In details, we reserve 10% of nodes as validation set at first.
Hardware Specification Yes We conduct our experiments on a computer with a Xeon 2.60GHz CPU and 64GB memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes We set the regularization coefficient to be 0.5 and the convergence parameter ϵ to be 0.001 for all experiments. The sample size t is determined according to data size. For Newman s datasets, we set t = m, i.e., the number of links. For SNAP datasets, we set t = 10 n in order to finish one iteration without taking too much time, where n is the number of nodes. The maximum times of iteration is set to 100, though in fact all datasets converge before reaching the limit.