Anonymous Walk Embeddings

Authors: Sergey Ivanov, Evgeny Burnaev

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental study shows state-of-the-art classification accuracy of feature-based AWE on real datasets.
Researcher Affiliation Collaboration 1Skolkovo Institute of Science and Technology, Moscow, Russia 2Criteo Research, Paris, France.
Pseudocode Yes Figure 3. A framework for learning data-driven anonymous walk embeddings.
Open Source Code Yes Code can be found at https://github.com/nd7141/AWE
Open Datasets Yes We evaluate performance on two sets of graphs. One set contains unlabeled graph data and is related to social networks (Yanardag & Vishwanathan, 2015). Another set contains graphs with labels on node and/or edges and originates from bioinformatics (Shervashidze et al., 2011). Statistics of these ten graph datasets presented in Table 1.
Dataset Splits Yes We perform a 10-fold cross-validation and for each fold we estimate SVM parameter C from the range [0.001, 0.01, 0.1, 1, 10] using validation set.
Hardware Specification Yes We run the experiments on a machine with Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz and 32GB RAM.
Software Dependencies No The paper mentions using an 'SVM classifier' and 'sampled softmax' but does not specify versions for any key software libraries or dependencies.
Experiment Setup Yes For feature-based anonymous walk embeddings (Def. 1), we choose length l of walks from the range [2, 3, . . . , 10] and approximate actual distribution of anonymous walks using sampling equation (2) with ε = 0.1 and δ = 0.05. For data-driven anonymous walk embeddings (Def. 4), we set length of walks l = 10 to generate a corpus of cooccurred anonymous walks. We run gradient descent with 100 iterations for 100 epochs with batch size that we vary from the range [100, 500, 1000, 5000, 10000]. Context walks are drawn from a window, which size varies in the range [2, 4, 8, 16]. The embedding size of walks and graphs da and dg equals to 128. Finally, candidate sampling function for softmax equation (4) chooses uniform or loguniform distribution of sampled classes. For RBF kernel function we choose parameter σ from the range [10 5, 10 4, . . . , 1, 10]; for Polynomial function we set c = 0 and d = 2.