Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond
Authors: Oleg Platonov, Denis Kuznedelev, Artem Babenko, Liudmila Prokhorenkova
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we first characterize some existing graph datasets in terms of homophily and LI to see which structural patterns are currently covered. Then, we show that LI, despite being a very simple graph characteristic, much better agrees with GNN performance than homophily.5 |
| Researcher Affiliation | Collaboration | Oleg Platonov HSE University Yandex Research Denis Kuznedelev Yandex Research Skoltech Artem Babenko Yandex Research Liudmila Prokhorenkova Yandex Research |
| Pseudocode | No | The paper describes methods and theoretical concepts but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Implementations of all the graph measures discussed in the paper and examples of their usage are provided in this Colab notebook. |
| Open Datasets | Yes | cora, citeseer, and pubmed [6, 25, 38, 27, 46] are three classic paper citation network benchmarks. Ogbn-arxiv and Ogbn-products [14] are two datasets from the recently proposed Open Graph Benchmark. |
| Dataset Splits | Yes | For each synthetic graph, we create 10 random 50%/25%/25% train/validation/test splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper states 'Our models are implemented using Py Torch [31] and DGL [44]' but does not specify version numbers for these software dependencies or the programming language. |
| Experiment Setup | Yes | For all models, we add a two-layer MLP after every graph neighborhood aggregation layer and further augment all models with skip connections [11], layer normalization [2], and GELU activation functions [12]. For all models, we use two graph neighborhood aggregation layers and hidden dimension of 512. We use Adam [15] optimizer with a learning rate of 3 10 5 and train for 1000 steps, selecting the best step based on the validation set performance. We use a dropout probability of 0.2 during training. |