Stochastic Training of Graph Convolutional Networks with Variance Reduction

Authors: Jianfei Chen, Jun Zhu, Le Song

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our algorithms enjoy similar convergence rate and model quality with the exact algorithm using only two neighbors per node. The running time of our algorithms on a large Reddit dataset is only one seventh of previous neighbor sampling algorithms. We empirically test our algorithms on six graph datasets, and the results match with the theory. The experiments are done on a Titan X (Maxwell) GPU.
Researcher Affiliation Collaboration Jianfei Chen 1 Jun Zhu 1 Le Song 2 3 1Dept. of Comp. Sci. & Tech., BNRist Center, State Key Lab for Intell. Tech. & Sys., THBI Lab, Tsinghua University, Beijing, 100084, China 2Georgia Institute of Technology 3Ant Financial
Pseudocode Yes We have the pseudocode for the training in Appendix D.
Open Source Code Yes Our code is released at https: //github.com/thu-ml/stochastic_gcn.
Open Datasets Yes We examine the variance and convergence of our algorithms empirically on six datasets, including Citeseer, Cora, Pub Med and NELL from Kipf & Welling (2017) and Reddit, PPI from Hamilton et al. (2017a), with the same train / validation / test splits, as summarized in Table 1.
Dataset Splits Yes We examine the variance and convergence of our algorithms empirically on six datasets, including Citeseer, Cora, Pub Med and NELL from Kipf & Welling (2017) and Reddit, PPI from Hamilton et al. (2017a), with the same train / validation / test splits, as summarized in Table 1.
Hardware Specification Yes The experiments are done on a Titan X (Maxwell) GPU.
Software Dependencies No The paper mentions 'frameworks such as Tensor Flow (Abadi et al., 2016)' but does not specify a version number for TensorFlow or any other software dependency.
Experiment Setup Yes We set the dropout rate as zero and plot the training loss with respect to number of epochs as Fig. 2. All the four algorithms have similar low time complexity per epoch with D(l) = 2, while M1+PP takes D(l) = 20.