reproducibilityindex.ai

Shift-Robust GNNs: Overcoming the Limitations of Localized Graph Training data

Authors: Qi Zhu, Natalia Ponomareva, Jiawei Han, Bryan Perozzi

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the effectiveness of SR-GNN in a variety of experiments with biased training datasets on common GNN benchmark datasets for semi-supervised learning, where we see that SRGNN outperforms other GNN baselines in accuracy, addressing at least 40% of the negative effects introduced by biased training data. On the largest dataset we consider, ogb-arxiv, we observe a 2% absolute improvement over the baseline and are able to mitigate 30% of the negative effects from training data bias. ... In this section we ﬁrst describe how we create training set with a controllable amount of bias, then discuss our experiment design, demonstrate the efﬁcacy our proposed framework for handling bias as well as its advantages over domain adaptation baselines, and ﬁnally, present a study on sensitivity to the hyperparameters.
Researcher Affiliation	Collaboration	: University of Illinois Urbana-Champaign : Google Research {qiz3,hanj}@illinois.edu, {nponomareva,bperozzi}@google.com
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and processed data are available at https://github.com/Gentle Zhu/Shift-Robust-GNNs.
Open Datasets	Yes	In our experiments, we perform semi-supervised node classiﬁcation tasks on ﬁve popular benchmark datasets: Cora, Citeseer, Pubmed [27], ogb-arxiv [26] and Reddit [11]. We use the same validation and test splits as in the original GCN paper [15] and OGB benchmark.
Dataset Splits	Yes	We use the same validation and test splits as in the original GCN paper [15] and OGB benchmark. We use the remaining nodes for training. For the unbiased baseline s performance numbers, we use a random sample from this training data.
Hardware Specification	Yes	Our experiments were run on a single machine with 8 CPU and 1 Nvidia T4 GPU.
Software Dependencies	No	The paper mentions using Adam [14] as an optimizer and specific GNN models, but it does not provide version numbers for any software dependencies like programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow).
Experiment Setup	Yes	The main hyper parameters in our sampler PPR-S are α = 0.1,γ = 100. When the graph is large, we set ϵ = 0.001 in the local algorithm for sparse PPR approximation. In SRGNN, λ = 1.0 is the penalty parameter for the discrepancy regularizer d, the lower bound for the instance weight Bl is 0.2. For all of the GNN methods except DGI, we set the hidden dimension as 32 for Cora, Citeseer, Pubmed and 256 for ogb-arxiv, with a dropout of 0.5. ... We use Adam [14] as an optimizer, and set the learning rate to 0.01 and L2 regularization to 5e-4.