ProNE: Fast and Scalable Network Representation Learning

Authors: Jie Zhang, Yuxiao Dong, Yan Wang, Jie Tang, Ming Ding

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the efficiency and effectiveness of the Pro NE method on multi-label node classification a commonly used task for network embedding evaluation [Perozzi et al., 2014; Tang et al., 2015; Grover and Leskovec, 2016]. We conduct experiments in five real networks and a set of random graphs. Extensive demonstrations show that the one-thread Pro NE model is about 10 400 faster than popular network embedding benchmarks with 20 threads, including Deep Walk, LINE, node2vec (See Figure 1).
Researcher Affiliation Collaboration Jie Zhang1 , Yuxiao Dong2 , Yan Wang1 , Jie Tang1 and Ming Ding1 1Department of Computer Science and Technology, Tsinghua University 2Microsoft Research, Redmond
Pseudocode No The paper describes the model and its steps using text and mathematical equations, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes 1Code is available at https://github.com/THUDM/Pro NE
Open Datasets Yes Dataset Blog Catalog Wiki PPI DBLP Youtube #nodes 10,312 4,777 3,890 51,264 1,138,499 #edges 333,983 184,812 76,584 127,968 2,990,443 #labels 39 40 50 60 47 Table 1: The statistics of datasets. Blog Catalog [Zafarani and Liu, 2009] is a social blogger network, in which Bloggers interests are used as labels. Wiki2 is a co-occurrence network of words in the first million bytes of the Wikipedia dump. Node labels are the Partof-Speech tags. 2http://www.mattmahoney.net/dc/text.html PPI [Breitkreutz et al., 2008] is a subgraph of the PPI network for Homo Sapiens. DBLP [Tang et al., 2008] is an academic citation network... Youtube [Zafarani and Liu, 2009] is a social network...
Dataset Splits No We randomly sample different percentages of labeled nodes for training a liblinear classifier and use the remaining for testing. (No explicit mention of a validation set or specific split percentages for training/testing/validation, only 'different percentages'.)
Hardware Specification Yes The experiments were conducted on a Red Hat server with Intel Xeon(R) CPU E5-4650 (2.70GHz) and 1T RAM.
Software Dependencies No Pro NE is implemented by Python 3.6.1. (It also mentions 'Sci Py package' but without a version number, and a single language version is not considered sufficient without other versioned libraries per the schema guidelines.)
Experiment Setup Yes For a fair comparison, we set the embedding dimension d = 128 for all methods. For the other parameters, we follow the original authors preferred choices. For Deep Walk and node2vec, windows size m=10, #walks per node r=80, walk length t=40. p, q in node2vec are searched over {0.25, 0.50, 1, 2, 4}. For LINE, #negative-samples k = 5 and total sampling budget T=r t |V |. For Gra Rep, the dimension of the concatenated embedding is d=128 for fairness. For HOPE, β is calculated in authors code and searched over (0, 1) for the best performance. For Pro NE, the term number of the Chebyshev expansion k is set to 10, µ=0.2, and θ=0.5, which are the default settings.