A Metadata-Driven Approach to Understand Graph Neural Networks
Authors: Ting Wei Li, Qiaozhu Mei, Jiaqi Ma
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a multivariate sparse regression analysis on the metadata derived from benchmarking GNN performance across diverse datasets, yielding a set of salient data properties. To validate the effectiveness of our data-driven approach, we focus on one identified data property, the degree distribution, and investigate how this property influences GNN performance through theoretical analysis and controlled experiments. |
| Researcher Affiliation | Academia | Ting Wei Li University of Michigan tingwl@umich.edu Qiaozhu Mei University of Michigan qmei@umich.edu Jiaqi Ma University of Illinois Urbana-Champaign jiaqima@illinois.edu |
| Pseudocode | No | The paper describes methods and processes in text and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the authors have made the source code for their proposed methodology publicly available. |
| Open Datasets | Yes | We obtain both the benchmark datasets and the model performance using the Graph Learning Indexer (GLI) library [27]. We include the following benchmark datasets in our regression analysis: cora [42], citeseer [42], pubmed [42], texas [33], cornell [33], wisconsin [33], actor [33], squirrel [33], chameleon [33], arxiv-year [23], snap-patents [23], penn94 [23], pokec [23], genius [23], and twitch-gamers [23]. |
| Dataset Splits | Yes | We randomly split the nodes into training, validation, and test sets with a ratio of 3:1:1. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., specific GPU/CPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using an 'R package, MSGLasso [21]', Adam [18], and Adam W [26] as optimizers, but does not provide specific version numbers for these software components or for broader frameworks like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | Specifically, we set learning rate = 0.01, weight decay = 0.001, dropout rate = 0.6, max epoch = 10000, and batch size = 256. We use Adam [18] as an optimizer for all models except LINKX. Adam W [26] is used with LINKX in order to comply with Lim et al. [23]. |