Online GNN Evaluation Under Test-time Graph Distribution Shifts
Authors: Xin Zheng, Dongjin Song, Qingsong Wen, Bo Du, Shirui Pan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on real-world test graphs under diverse graph distribution shifts could verify the effectiveness of the proposed method, revealing its strong correlation with ground-truth test errors on various well-trained GNN models. |
| Researcher Affiliation | Collaboration | Xin Zheng Monash University Melbourne, Australia xin.zheng@monash.edu Dongjin Song University of Connecticut Storrs, USA dongjin.song@uconn.edu Qingsong Wen Squirrel AI Bellevue, USA qingsongedu@gmail.com Bo Du Wuhan University Wuhan, China dubo@whu.edu Shirui Pan Griffith University Queensland, Australia s.pan@griffith.edu.au |
| Pseudocode | Yes | Algorithm 1 Learning Behavior Discrepancy (LEBED) Score Computation. |
| Open Source Code | Yes | 1Code is available at https://github.com/Amanda-Zheng/LEBED |
| Open Datasets | Yes | We perform experiments on six real-world graph datasets with diverse graph data distribution shifts containing: node feature shifts (Wu et al., 2022; Jin et al., 2023b)), domain shifts (Wu et al., 2020), temporal shifts (Wu et al., 2022). Detailed statistics of all these datasets are listed in Table A1 in Appendix B. |
| Dataset Splits | Yes | For all training graphs and validation graphs, we follow the process procedures and splits in works (Wu et al., 2022) and (Wu et al., 2020). |
| Hardware Specification | Yes | The running time comparison on Citationv2 in seconds is shown in Fig. 3 with a single Ge Force RTX 3080 GPU and 200 iterations for w/ Dstru.. |
| Software Dependencies | No | In our experiments, we use Pytorch geometric library (Fey & Lenssen, 2019) and four Ge Force RTX 3080 GPUs for all implementations. However, specific version numbers for the software dependencies are not provided. |
| Experiment Setup | Yes | More details of these well-trained GNN models, including architectures, training hyper-parameters, and groundtruth test error distributions, are provided in Appendix D. We report the correlation between the proposed LEBED and the ground-truth test errors under unseen and unlabeled test graphs with distribution shifts, using R2 and rank correlation Spearman s ρ, where R2 ranges [0, 1], representing the degree of linear fit between two variables. The closer it is to 1, the higher the linear correlation. Spearman s ρ ranges [ 1, 1], representing the monotonic correlation between two variables with 1 indicating the positive correlation and 1 indicating the negative correlation. |