GCN meets GPU: Decoupling “When to Sample” from “How to Sample”
Authors: Morteza Ramezani, Weilin Cong, Mehrdad Mahdavi, Anand Sivasubramaniam, Mahmut Kandemir
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct extensive numerical experiments on different large-scale graph datasets and different sampling methods to corroborate our theoretical findings, and demonstrate the practical efficacy of the proposed algorithm over competitive baselines. Overall, our empirical results demonstrate that LAZYGCN can significantly reduce the number of sampling steps and yield superior speedup without compromising the accuracy. |
| Researcher Affiliation | Academia | Morteza Ramezani Pennsylvania State University morteza@cse.psu.edu Weilin Cong Pennsylvania State University wxc272@psu.edu Mehrdad Mahdavi Pennsylvania State University mzm616@psu.edu Anand Sivasubramaniam Pennsylvania State University anand@cse.psu.edu Mahmut T. Kandemir Pennsylvania State University kandemir@cse.psu.edu |
| Pseudocode | Yes | Algorithm 1 LAZYGCN training algorithm |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about the release of its source code. |
| Open Datasets | Yes | We evaluate the effectiveness of LAZYGCN under inductive supervise setting on the following real-world datasets: Pubmed, PPI-Large, Flickr, Reddit, Yelp, and Amazon. Detailed information of these datasets are summarized in Table 1. |
| Dataset Splits | Yes | Table 1: Summary of datasets statistics. indicates multi-labels dataset Dataset Nodes Edges Degree Feature Classes Train / Validation / Test Pubmed 19,717 44,338 3 500 3 92% / 3% / 5% PPI-Large 56,944 1,612,348 15 50 121 66% / 12% / 22% Flickr 89,250 899,756 10 500 7 50% / 25% / 25% Reddit 232,965 11,606,919 50 602 41 66% / 10% / 24% Yelp 716,847 13,954,819 19 300 100 75% / 15% / 10% Amazon 1,598,960 264,339,468 124 200 107 78% / 5% / 15% |
| Hardware Specification | Yes | For instance, the memory capacity on a very recent GPU card, such as NVIDIA Tesla V100, is at most 32 GB |
| Software Dependencies | No | We implemented all these algorithms alongside LAZYGCN, using Py Torch [22] and Py Torch Geometric [10] for sparse matrix operations. |
| Experiment Setup | Yes | All our experiments are conducted using a 3-layer GCN with hidden dimension of 512 and Adam optimizer with a learning rate of 10-3. Test and validation accuracies (F1 score) are obtained by running the full-batch GCN. For nodewise, we used 5 neighbors to sample, for layerwise we used a sample size of 512, and for subgraph we used a sample size which is equal to the mini-batch size. For LAZYGCN training, we used fixed R = 2 and ρ = 1.1 unless otherwise stated. |