Bandit Samplers for Training Graph Neural Networks
Authors: Ziqi Liu, Zhengwei Wu, Zhiqiang Zhang, Jun Zhou, Shuang Yang, Le Song, Yuan Qi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically show that our algorithm asymptotically approaches the optimal variance within a factor of 3. We show the efficiency and effectiveness of our approach on multiple datasets. and We empirically show that our approachs are way competitive in terms of convergence and sample variance, compared with state-of-the-art approaches on multiple public datasets. |
| Researcher Affiliation | Collaboration | Ziqi Liu Ant Group ziqiliu@antfin.com Zhengwei Wu Ant Group zejun.wzw@antfin.com Zhiqiang Zhang Ant Group lingyao.zzq@antfin.com Jun Zhou Ant Group jun.zhoujun@antfin.com Shuang Yang Alibaba Group shuang.yang@antfin.com Le Song Ant Group Georgia Institute of Technology lsong@cc.gatech.edu Yuan Qi Ant Group yuan.qi@antfin.com |
| Pseudocode | Yes | Algorithm 1 Bandit Samplers for Training GNNs. |
| Open Source Code | Yes | Please find our implementations at https://github.com/xavierzw/gnn-bs. |
| Open Datasets | Yes | We report results on 5 benchmark data that include Cora [18], Pubmed [18], PPI [11], Reddit [11], and Flickr [22]. |
| Dataset Splits | Yes | We follow the standard data splits, and summarize the statistics in Table 1. By following the exsiting implementations3, we save the model based on the best results on validation, and restore the model to report results on testing data in Section 7.1. |
| Hardware Specification | Yes | We run all the experiments using one machine with Intel Xeon E5-2682 and 512GB RAM. |
| Software Dependencies | No | No specific software versions (e.g., Python 3.8, PyTorch 1.9) were mentioned for reproducibility. |
| Experiment Setup | Yes | We fix the number of layers as 2 as in [13] for all comparison algorithms. We set the dimension of hidden embeddings as 16 for Cora and Pubmed, and 256 for PPI, Reddit and Flickr. For a fair comparison, we do not use the normalization layer [2] particularly used in some works [5, 22]. For attentive GNNs, we use the attention layer proposed in GAT. we set the number of multi-heads as 1 for simplicity. We do grid search for the following hyperparameters in each algorithm, i.e., the learning rate {0.01, 0.001}, the penalty weight on the ℓ2-norm regularizers {0, 0.0001, 0.0005, 0.001}, the dropout rate {0, 0.1, 0.2, 0.3}. For the sample size in Graph SAGE, S-GCN and our algorithms, we set 1 for Cora and Pubmed, 5 for Flickr, 10 for PPI and reddit. |