On the Equivalence between Positional Node Embeddings and Structural Graph Representations

Authors: Balasubramaniam Srinivasan, Bruno Ribeiro

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 RESULTS This section focuses on applying the lessons learned in Section 3 in four tasks, divided into two common goals. The goal of the first three tasks is to show that, as described in Theorem 2, node embeddings can be used to create expressive structural embeddings of nodes, tuples, and triads. These representations are then subsequently used to make predictions on downstream tasks with varied node set sizes. The tasks also showcase the added value of using multiple node embeddings (Monte Carlo) samples to estimate structural representations, both during training and testing. Moreover, showcasing Theorem 1 and the inability of node representations to capture joint structural representations, these tasks show that structural node representations are useless in prediction tasks over more than one node, such as links and triads. The goal of fourth task is to showcase how multiple Monte Carlo samples of node embeddings are required to observe the fundamental relationship between structural representations and node embeddings predicted by Theorem 2. ... In Table 1 we present Micro-F1 scores for all four models over the three tasks.
Researcher Affiliation Academia Balasubramaniam Srinivasan Department of Computer Science Purdue University bsriniv@purdue.edu Bruno Ribeiro Department of Computer Science Purdue University ribeiro@cs.purdue.edu
Pseudocode Yes Algorithm 1: Node Embeddings from the Unrolled Gibbs Sampler ... Algorithm 2: Structural Representations from the Node Embedding Samples
Open Source Code Yes For more details refer to the code provided.
Open Datasets Yes Datasets: We consider four graph datasets used by Hamilton et al. (2017a), namely Cora, Citeseer, Pubmed (Namata et al., 2012; Sen et al., 2008) and PPI (Zitnik & Leskovec, 2017). Cora, Citeseer and Pubmed are citation networks, where vertices represent papers, edges represent citations, and vertex features are bag-of-words representation of the document text. The PPI (protein-protein interaction) dataset is a collection of multiple graphs representing the human tissue, where vertices represent proteins, edges represent interactions across them, and node features include genetic and immunological information.
Dataset Splits Yes Train, validation and test splits are used as proposed by Yang et al. (2016) (see Table 3 in the Appendix). ... Table 3: Summary of the datasets ... Number of Training Vertices ... Number of Validation Vertices ... Number of Test Vertices
Hardware Specification Yes Training was performed on Titan V GPUs.
Software Dependencies Yes Our implementation is in Py Torch using Python 3.6. The implementations for GIN and RP-GIN are done using the Py Torch Geometric Framework.
Experiment Setup Yes We used two convolutional layers for GIN, RP-GIN since it had the best performance in our tasks (we had tested with 2/3/4/5 convolutional layers). Also since we perform tasks based on node representations rather than graph representations, we ignore the graph wide readout. For GIN and RP-GIN, the embedding dimension was set to 256 at both convolutional layers. All MLPS, across all models have 256 neurons. Optimization is performed with the Adam Optimizer (Kingma & Ba, 2014). For the GIN, RP-GIN the learning rate was tuned in {0.01, 0.001, 0.0001, 0.00001} whereas for CGNN s the learning rate was set to 0.001.