Representing Long-Range Context for Graph Neural Networks with Global Attention
Authors: Zhanghao Wu, Paras Jain, Matthew Wright, Azalia Mirhoseini, Joseph E. Gonzalez, Ion Stoica
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our simple architecture leads to state-of-the-art results on several graph classification tasks, outperforming methods that explicitly encode graph structure. 5 Experiments |
| Researcher Affiliation | Collaboration | 1UC Berkeley, 2Google Brain |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for Graph Trans is available at https://github.com/ucbrise/graphtrans. |
| Open Datasets | Yes | We choose two commonly used graph classification benchmarks, NCI1 and NCI109 [31]. For chemical benchmarks, we evaluate our Graph Trans on a dataset larger than NCI dataset, molpcba from the Open Graph Benchmark (OGB) [15]. For the computer programming benchmark, we also adopt a large dataset, code2 from OGB... |
| Dataset Splits | Yes | We follow the settings in [20, 2] for the NCI1 and NCI109, randomly splitting the dataset into training, validation, and test set by a ratio of 8:1:1. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions the Adam optimizer and common model architectures but does not specify version numbers for key software components or libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | All of our models are trained with the Adam optimizer [17] with a learning rate of 0.0001, a weight decay of 0.0001, and the default Adam β parameters. All Transformer modules used in our experiments have an embedding dimension d TF of 128 and a hidden dimension of 512 in the feedforward subnetwork. We trained Graph Trans on both the NCI1 and NCI109 datasets for 100 epochs with a batch size of 256. For both GNN and Transformer modules, we apply a dropout of 0.3. We train all the models for 30 epochs with a batch size of 16... |