On the Equivalence between Positional Node Embeddings and Structural Graph Representations
Authors: Balasubramaniam Srinivasan, Bruno Ribeiro
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 RESULTS This section focuses on applying the lessons learned in Section 3 in four tasks, divided into two common goals. The goal of the first three tasks is to show that, as described in Theorem 2, node embeddings can be used to create expressive structural embeddings of nodes, tuples, and triads. These representations are then subsequently used to make predictions on downstream tasks with varied node set sizes. The tasks also showcase the added value of using multiple node embeddings (Monte Carlo) samples to estimate structural representations, both during training and testing. Moreover, showcasing Theorem 1 and the inability of node representations to capture joint structural representations, these tasks show that structural node representations are useless in prediction tasks over more than one node, such as links and triads. The goal of fourth task is to showcase how multiple Monte Carlo samples of node embeddings are required to observe the fundamental relationship between structural representations and node embeddings predicted by Theorem 2. ... In Table 1 we present Micro-F1 scores for all four models over the three tasks. |
| Researcher Affiliation | Academia | Balasubramaniam Srinivasan Department of Computer Science Purdue University bsriniv@purdue.edu Bruno Ribeiro Department of Computer Science Purdue University ribeiro@cs.purdue.edu |
| Pseudocode | Yes | Algorithm 1: Node Embeddings from the Unrolled Gibbs Sampler ... Algorithm 2: Structural Representations from the Node Embedding Samples |
| Open Source Code | Yes | For more details refer to the code provided. |
| Open Datasets | Yes | Datasets: We consider four graph datasets used by Hamilton et al. (2017a), namely Cora, Citeseer, Pubmed (Namata et al., 2012; Sen et al., 2008) and PPI (Zitnik & Leskovec, 2017). Cora, Citeseer and Pubmed are citation networks, where vertices represent papers, edges represent citations, and vertex features are bag-of-words representation of the document text. The PPI (protein-protein interaction) dataset is a collection of multiple graphs representing the human tissue, where vertices represent proteins, edges represent interactions across them, and node features include genetic and immunological information. |
| Dataset Splits | Yes | Train, validation and test splits are used as proposed by Yang et al. (2016) (see Table 3 in the Appendix). ... Table 3: Summary of the datasets ... Number of Training Vertices ... Number of Validation Vertices ... Number of Test Vertices |
| Hardware Specification | Yes | Training was performed on Titan V GPUs. |
| Software Dependencies | Yes | Our implementation is in Py Torch using Python 3.6. The implementations for GIN and RP-GIN are done using the Py Torch Geometric Framework. |
| Experiment Setup | Yes | We used two convolutional layers for GIN, RP-GIN since it had the best performance in our tasks (we had tested with 2/3/4/5 convolutional layers). Also since we perform tasks based on node representations rather than graph representations, we ignore the graph wide readout. For GIN and RP-GIN, the embedding dimension was set to 256 at both convolutional layers. All MLPS, across all models have 256 neurons. Optimization is performed with the Adam Optimizer (Kingma & Ba, 2014). For the GIN, RP-GIN the learning rate was tuned in {0.01, 0.001, 0.0001, 0.00001} whereas for CGNN s the learning rate was set to 0.001. |