Efficient Community Search over Large Directed Graph: An Augmented Index-based Approach

Authors: Yankai Chen, Jie Zhang, Yixiang Fang, Xin Cao, Irwin King

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on six real large graphs show that our index-based query algorithm is up to two orders of magnitude faster than existing solutions.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, The Chinese University of Hong Kong 2School of Computer Science and Engineering, Nanyang Technological University 3School of Computer Science and Engineering, The University of New South Wales
Pseudocode Yes Algorithm 1 Index construction algorithm: Bottom Up. Algorithm 2 Functions of the CUF data structure. Algorithm 3 Process vertices to create tree nodes.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets Yes We use six real large directed graphs in Table 1, where d is the average degree of vertices. The Twitter dataset 3 is collected by Kristina Lerman. The eu-2015, arabic, it-2004, sk-2005 and uk-2007 datasets [Boldi et al., 2014] are available in the website4. 3https://www.isi.edu/lerman/downloads/twitter/twitter2010.html 4http://law.di.unimi.it/datasets.php
Dataset Splits No The paper describes generating sub-datasets by randomly selecting percentages of vertices (e.g., "20%, 40%, 60% and 80% of its vertices and obtain the four subgraphs induced by these vertices") for scalability evaluation, but does not explicitly provide training, validation, or test dataset splits in the conventional sense for model training.
Hardware Specification Yes We implement all the algorithms in Java, and run the experiments on a Linux machine with Ten-core Intel E74820 V3 CPU@1.90GHz and 300GB memory.
Software Dependencies No The paper states that algorithms are implemented "in Java", but does not provide specific version numbers for Java or any other software libraries, frameworks, or dependencies used.
Experiment Setup Yes As suggested in [Fang et al., 2019b], for each sub-dataset, we randomly select 200 vertices which are in the (8,8)-cores as the query vertices and set k=l=8. We depict the effect of k and l on the efficiency of CSD queries in Figure 4(g)-(r). For each dataset, we randomly select 200 vertices within the (8,8)-cores to query.