Efficient Attributed Network Embedding via Recursive Randomized Hashing

Authors: Wei Wu, Bin Li, Ling Chen, Chengqi Zhang

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experimental results show that the proposed algorithm, which does not need learning, runs significantly faster than the state-of-the-art learning-based network embedding methods while achieving competitive or even better performance in accuracy.
Researcher Affiliation Academia 1 Centre for Artificial Intelligence, University of Technology Sydney, Australia 2 School of Computer Science, Fudan University, China william.third.wu@gmail.com, libin@fudan.edu.cn, {ling.chen, chengqi.zhang}@uts.edu.au
Pseudocode Yes Algorithm 1 The Net Hash Algorithm Input: G = (V, E, f); number of embedding dimensions K; entropy of degrees of network S; depth of tree D 1; hash functions at the l-th level {π(l) k } D,kl l=0,k=1 Output: G s embedding h 1: for r = 1, . . . , |V| do 2: Build a parent pointer tree T for node r; 3: Initialize an empty auxiliary queue Q; 4: for v T do 5: l level of v in T; 6: merger f(v); // initial merger from attributes on v 7: while Q is not empty and v is the parent node of Q[0] in T do 8: merger merge(Q.pop().digest, merger); 9: end while 10: digest Min Hash(merger, {π(l) k } kl k=1); 11: Q. push({digest, v}); 12: end for 13: h(r) Q.pop().digest; 14: end for
Open Source Code No No explicit statement about the release of the source code for the proposed Net Hash algorithm, nor a link to a code repository, was found in the paper.
Open Datasets Yes Data sets: (1) Cora [Yang et al., 2015]: A citation network of machine learning papers. (2) Wikipedia [Yang et al., 2015]: A citation network of articles in Wikipedia. (3) Flickr [Li et al., 2015]: The network consists of users as nodes, following relationship as edges and interest tags of users as attributes. (4) Blog Catalog [Li et al., 2015]: The network consists of bloggers as nodes, following relationship as edges and keywords in blog as attributes. (5) ACM [Tang et al., 2008]: The original data contains 2,381,688 ACM papers and 10,476,564 citation relationship.
Dataset Splits Yes We vary the training ratio (i.e., percentage of nodes as the training set) in {50%, 60%, 70%, 80%, 90%}, for each ratio of which we repeat the experiment 10 times and average the results.
Hardware Specification Yes All experiments are conducted on a node of Linux Cluster with 8 3.4 GHz Intel Xeon CPU (64 bit) and 32GB RAM.
Software Dependencies No The paper mentions using LIBSVM and LIBLINEAR, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For all methods, we set the embedding dimension K = 200, as in TADW and CANE. ... Net Hash has two exclusive parameters, tree depth D and decay rate λ. ... We set D = 1 for Wikipedia, Flickr and Blog Catalog, and D = 2 for Cora and ACM. ... Hence, we adopt the entropy of node degrees S as the decay rate.