A Topological Perspective on Demystifying GNN-Based Link Prediction Performance
Authors: Yu Wang, Tong Zhao, Yuying Zhao, Yunchao Liu, Xueqi Cheng, Neil Shah, Tyler Derr
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To this end, we demystify which nodes perform better from the perspective of their local topology. Despite the widespread belief that low-degree nodes exhibit worse LP performance, we surprisingly observe an inconsistent performance trend. This prompts us to propose a node-level metric, Topological Concentration (TC), based on the intersection of the local subgraph of each node with the ones of its neighbors. We empirically demonstrate that TC correlates with LP performance more than other nodelevel topological metrics, better identifying low-performing nodes than using degree. With TC, we discover a novel topological distribution shift issue in which nodes newly joined neighbors tend to become less interactive with their existing neighbors, compromising the generalizability of node embeddings for LP at testing time. To make the computation of TC scalable, We further propose Approximated Topological Concentration (ATC) and justify its efficacy in approximating TC with reduced computation complexity. Given the positive correlation between node TC and its LP performance, we explore the potential of boosting LP performance via enhancing TC by re-weighting edges in the message-passing and discuss its effectiveness with limitations. Our code is publicly available at https://github.com/Yu WVandy/Topo LP GNN. |
| Researcher Affiliation | Collaboration | Yu Wang1, Tong Zhao2, Yuying Zhao1, Yunchao Liu1, Xueqi Cheng1, Neil Shah2, Tyler Derr1 1Vanderbilt University 2Snap Inc. |
| Pseudocode | Yes | Algorithm 1: Edge Reweighting to Boost LP performance |
| Open Source Code | Yes | Our code is publicly available at https://github.com/Yu WVandy/Topo LP GNN. |
| Open Datasets | Yes | We use five widely employed datasets for evaluating the link prediction task, including four citation networks: Cora, Citeseer, Pubmed, and Citation2, and 1 human social network Collab. We further introduce two real-world animal social networks, Reptile and Vole, based on animal interactions. ... Table 3: Statistic of datasets used for evaluating link prediction. |
| Dataset Splits | Yes | Following (Zhao et al., 2022; Chamberlain et al., 2022; Wang et al., 2023), we randomly split edges into 70%/10%/20% so that there is no topological distribution shift in these datasets. We use Hits@100 to evaluate the final performance. Collab/Citation2: We leverage the default edge splitting from OGBL (Hu et al., 2020). |
| Hardware Specification | No | The paper states 'Due to GPU memory limitation, we choose SAGE for Citation2.' but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software like GCN/SAGE/Light GCN and NCN, but does not provide specific version numbers for these or other software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | The search space for the hyperparameters of the GCN/SAGE/Light GCN baselines and their augmented variants GCNrw/SAGErw are: graph convolutional layer {1, 2, 3}, hidden dimension of graph encoder {64, 128, 256}, the learning rate of the encoder and predictor {0.001, 0.005, 0.01}, dropout {0.2, 0.5, 0.8}, training epoch {50, 100, 500, 1000}, batch size {256, 1152, 64 * 1024} (Hu et al., 2020; Chamberlain et al., 2022; Wang et al., 2023), weights alpha {0.5, 1, 2, 3, 4}, the update interval tau {1, 2, 10, 20, 50}, warm up epochs T warm {1, 2, 5, 10, 30, 50}. |