Residual2Vec: Debiasing graph embedding with random graphs

Authors: Sadamori Kojaku, Jisung Yoon, Isabel Constantino, Yong-Yeol Ahn

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that this debiasing not only improves link prediction and clustering performance but also allows us to explicitly model salient structural properties in graph embedding. We test residual2vec using link prediction and community detection benchmarks [4,22,38 40]. We use the soft configuration model [31] as the null graph for residual2vec, denoted by r2v-config, which yields a degree-debiased embedding.
Researcher Affiliation Academia Sadamori Kojaku1, Jisung Yoon2, Isabel Constantino3, Yong-Yeol Ahn1,3,4 1Center for Complex Networks and Systems Research, Luddy School of Informatics, Computing and Engineering, Indiana University, USA 2Department of Industrial Management and Engineering Pohang University of Science and Technology, South Korea 3Indiana University Network Science Institute Indiana University, USA 4Connection Science Massachusetts Institute of Technology, USA {skojaku, imconsta, yyahn}@iu.edu, jisung.yoon92@gmail.com
Pseudocode No The paper describes algorithms and methods but does not provide pseudocode or algorithm blocks.
Open Source Code Yes The python code of residual2vec is available at Git Hub [17]. [17] Kojaku, S., Yoon, J., Constantino, I. & Ahn, Y.-Y. Python package for the residual2vec graph embedding. https://github.com/skojaku/residual2vec. (Accessed on 10/08/2021).
Open Datasets Yes We test residual2vec using link prediction and community detection benchmarks [4,22,38 40]. For all random-walk-based methods, we run 10 walkers per node for 80 steps and set T = 10 and training iterations to 5. We perform the benchmark for the graphs in Table 2. Table 2: List of empirical graphs and structural properties. Ref. Airport [33] Protein-Protein [34] Wikipedia vote [35] Coauthorship (Hep Th) [36] Citation (DBLP) [37] Coauthorship (Astro Ph) [36]
Dataset Splits Yes Link prediction task is to find missing edges based on graph structure, a basic task for various applications such as recommending friends and products [4,40,49]. The link prediction task consists of the following three steps. First, given a graph, a fraction (ρ = 0.5) of edges are randomly removed. Second, the edge-removed graph is embedded using a graph embedding method. Third, the removed edges are predicted based on a likelihood score calculated based on the generated embedding. In the edge removal process, we keep edges in a minimum spanning tree of the graph to ensure that the graph is a connected component [4, 40]. We perform 5-cross validations and compute R2-score (Fig. 4D). To ensure that the train and test sets do not have the same journals in the cross-validation, we split the set of journals i into the train and test sets instead of splitting the set of nodes (i,t).
Hardware Specification No Because the graph has a high average degree (i.e., 1, 049), some algorithms are computationally demanding. For this reason, we omit node2vec with q = 0.5 and q = 2, Net MF, and GAT due to memory shortage (1Tb of RAM). Furthermore, for GCN, we set ˆK = 32, which still took more than 18 hours. This indicates memory constraints but does not specify the exact hardware (CPU, GPU models, etc.) used for the experiments.
Software Dependencies No We use two-layer GCN, Graph SAGE, and GAT implemented in Stellar Graph package [46]. [46] CSIRO’s Data61. Stellargraph machine learning library (2018). The paper mentions the StellarGraph library and its year of release, but it does not specify exact version numbers for this library or any other software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For all random-walk-based methods, we run 10 walkers per node for 80 steps and set T = 10 and training iterations to 5. We set the parameters of node2vec by p = 1 and q {0.5, 1, 2}. For Glove, we input the sentences generated by random walks. We set ˆK to dimension K (i.e., ˆK = K). We generate graphs of N = 1, 000 nodes with the parameters used in Ref. [38] and embed the graphs to K = 64 dimensional space. We generate K = 128 dimensional embeddings with T = 10.