Bandit Samplers for Training Graph Neural Networks

Authors: Ziqi Liu, Zhengwei Wu, Zhiqiang Zhang, Jun Zhou, Shuang Yang, Le Song, Yuan Qi

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We theoretically show that our algorithm asymptotically approaches the optimal variance within a factor of 3. We show the efficiency and effectiveness of our approach on multiple datasets. and We empirically show that our approachs are way competitive in terms of convergence and sample variance, compared with state-of-the-art approaches on multiple public datasets.
Researcher Affiliation Collaboration Ziqi Liu Ant Group ziqiliu@antfin.com Zhengwei Wu Ant Group zejun.wzw@antfin.com Zhiqiang Zhang Ant Group lingyao.zzq@antfin.com Jun Zhou Ant Group jun.zhoujun@antfin.com Shuang Yang Alibaba Group shuang.yang@antfin.com Le Song Ant Group Georgia Institute of Technology lsong@cc.gatech.edu Yuan Qi Ant Group yuan.qi@antfin.com
Pseudocode Yes Algorithm 1 Bandit Samplers for Training GNNs.
Open Source Code Yes Please find our implementations at https://github.com/xavierzw/gnn-bs.
Open Datasets Yes We report results on 5 benchmark data that include Cora [18], Pubmed [18], PPI [11], Reddit [11], and Flickr [22].
Dataset Splits Yes We follow the standard data splits, and summarize the statistics in Table 1. By following the exsiting implementations3, we save the model based on the best results on validation, and restore the model to report results on testing data in Section 7.1.
Hardware Specification Yes We run all the experiments using one machine with Intel Xeon E5-2682 and 512GB RAM.
Software Dependencies No No specific software versions (e.g., Python 3.8, PyTorch 1.9) were mentioned for reproducibility.
Experiment Setup Yes We fix the number of layers as 2 as in [13] for all comparison algorithms. We set the dimension of hidden embeddings as 16 for Cora and Pubmed, and 256 for PPI, Reddit and Flickr. For a fair comparison, we do not use the normalization layer [2] particularly used in some works [5, 22]. For attentive GNNs, we use the attention layer proposed in GAT. we set the number of multi-heads as 1 for simplicity. We do grid search for the following hyperparameters in each algorithm, i.e., the learning rate {0.01, 0.001}, the penalty weight on the ℓ2-norm regularizers {0, 0.0001, 0.0005, 0.001}, the dropout rate {0, 0.1, 0.2, 0.3}. For the sample size in Graph SAGE, S-GCN and our algorithms, we set 1 for Cora and Pubmed, 5 for Flickr, 10 for PPI and reddit.