Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data

Authors: Qi Zhu, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate the effectiveness of SR-GNN in a variety of experiments with biased training datasets on common GNN benchmark datasets for semi-supervised learning, where we see that SRGNN outperforms other GNN baselines in accuracy, addressing at least 40% of the negative effects introduced by biased training data. On the largest dataset we consider, ogb-arxiv, we observe a 2% absolute improvement over the baseline and are able to mitigate 30% of the negative effects from training data bias. ... In this section we first describe how we create training set with a controllable amount of bias, then discuss our experiment design, demonstrate the efficacy our proposed framework for handling bias as well as its advantages over domain adaptation baselines, and finally, present a study on sensitivity to the hyperparameters.
Researcher Affiliation Collaboration *: University of Illinois Urbana-Champaign : Google Research *{qiz3,hanj}@illinois.edu, {nponomareva,bperozzi}@google.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code and processed data are available at https://github.com/Gentle Zhu/Shift-Robust-GNNs.
Open Datasets Yes In our experiments, we perform semi-supervised node classification tasks on five popular benchmark datasets: Cora, Citeseer, Pubmed [27], ogb-arxiv [26] and Reddit [11]. We use the same validation and test splits as in the original GCN paper [15] and OGB benchmark.
Dataset Splits Yes We use the same validation and test splits as in the original GCN paper [15] and OGB benchmark. We use the remaining nodes for training. For the unbiased baseline s performance numbers, we use a random sample from this training data.
Hardware Specification Yes Our experiments were run on a single machine with 8 CPU and 1 Nvidia T4 GPU.
Software Dependencies No The paper mentions using Adam [14] as an optimizer and specific GNN models, but it does not provide version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes The main hyper parameters in our sampler PPR-S are α = 0.1,γ = 100. When the graph is large, we set ϵ = 0.001 in the local algorithm for sparse PPR approximation. In SRGNN, λ = 1.0 is the penalty parameter for the discrepancy regularizer d, the lower bound for the instance weight Bl is 0.2. For all of the GNN methods except DGI, we set the hidden dimension as 32 for Cora, Citeseer, Pubmed and 256 for ogb-arxiv, with a dropout of 0.5. ... We use Adam [14] as an optimizer, and set the learning rate to 0.01 and L2 regularization to 5e-4.