Layer-Neighbor Sampling --- Defusing Neighborhood Explosion in GNNs
Authors: Muhammed Fatih Balin, Ümit Çatalyürek
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate the performance of each method in the node-prediction setting on the following datasets: reddit, products, yelp, flickr. [...] We experimentally verify our findings and show that our proposed sampling algorithm LABOR outperforms both neighbor and layer sampling approaches. |
| Researcher Affiliation | Collaboration | Muhammed Fatih Balın balin@gatech.edu Ümit V. Çatalyürek umit@gatech.edu [...] Part of the GPU acceleration in this work was implemented during an internship at NVIDIA Corporation. School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA, USA Amazon Web Services. This publication describes work performed at the Georgia Institute of Technology and is not associated with AWS. |
| Pseudocode | Yes | Algorithm 1 LABOR-i for uniform edge weights |
| Open Source Code | Yes | We implemented LABOR variants and PLADIES in DGL5 [Wang et al., 2019], and carried out our experiments using DGL with the Pytorch backend [Paszke et al., 2019], URLs are provided in Appendix A.8. [...] Our CUDA and C++ implementations are available in DGL starting with version 1.0 as dgl.dataloading.Labor Sampler and dgl.sampling.sample_labors. [...] Appendix A.8 Links to the experimental code: Experiments in Section 4 can be reproduced using https://github.com/dmlc/dgl/tree/f971e25a4dff33d5d219dccf523e32c62360ffd2/examples/pytorch/labor. Our initial contribution to DGL can be found in https://github.com/dmlc/dgl/pull/4668. |
| Open Datasets | Yes | In this section, we empirically evaluate the performance of each method in the node-prediction setting on the following datasets: reddit [Hamilton et al., 2017], products [Hu et al., 2020a], yelp, flickr [Zeng et al., 2020]. Details about these datasets are given in Table 1. |
| Dataset Splits | Yes | Table 1: Properties of the datasets used in experiments: numbers of vertices (|V |), edges (|E|), avg. degree (|E| / |V |), number of features, sampling budget used, training, validation and test vertex split. |
| Hardware Specification | Yes | The timing information was measured on an NVIDIA T4 GPU. [...] Table 5: The runtimes (ms) per iteration for the GATv2 model on NVIDIA A100 80GB [...] We ran this experiment on an A100 GPU and stored the input features on the main memory, which were accessed over the PCI-e directly during training by pinning their memory. |
| Software Dependencies | No | The paper states 'We implemented LABOR variants and PLADIES in DGL5 [Wang et al., 2019], and carried out our experiments using DGL with the Pytorch backend [Paszke et al., 2019]', but does not provide specific version numbers for DGL or Pytorch. Only the citation to the general framework paper. |
| Experiment Setup | Yes | We evaluate all the methods on the GCN model in (2) with 3 layers, with 256 hidden dimension and residual skip connections enabled. We use the Adam optimizer [Kingma and Ba, 2014] with 0.001 learning rate. [...] In this experiment, we set the batch size to 1,000 and the fanout k = 10 for LABOR and NS methods [...] The hyperparameters of LADIES and PLADIES were picked to match LABOR-* so that all methods have the same vertex sampling budget in each layer (see Table 2). [...] The batch sizes used in Figure 2. These were chosen such that in expectation, each method samples with the same budget given in Table 1. [...] We tune the learning rate between [10-4, 10-1], the batch size between [210, 215] and the fanout for each layer between [5, 25]. |