Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond

Authors: Oleg Platonov, Denis Kuznedelev, Artem Babenko, Liudmila Prokhorenkova

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we first characterize some existing graph datasets in terms of homophily and LI to see which structural patterns are currently covered. Then, we show that LI, despite being a very simple graph characteristic, much better agrees with GNN performance than homophily.5
Researcher Affiliation Collaboration Oleg Platonov HSE University Yandex Research Denis Kuznedelev Yandex Research Skoltech Artem Babenko Yandex Research Liudmila Prokhorenkova Yandex Research
Pseudocode No The paper describes methods and theoretical concepts but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Implementations of all the graph measures discussed in the paper and examples of their usage are provided in this Colab notebook.
Open Datasets Yes cora, citeseer, and pubmed [6, 25, 38, 27, 46] are three classic paper citation network benchmarks. Ogbn-arxiv and Ogbn-products [14] are two datasets from the recently proposed Open Graph Benchmark.
Dataset Splits Yes For each synthetic graph, we create 10 random 50%/25%/25% train/validation/test splits.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies No The paper states 'Our models are implemented using Py Torch [31] and DGL [44]' but does not specify version numbers for these software dependencies or the programming language.
Experiment Setup Yes For all models, we add a two-layer MLP after every graph neighborhood aggregation layer and further augment all models with skip connections [11], layer normalization [2], and GELU activation functions [12]. For all models, we use two graph neighborhood aggregation layers and hidden dimension of 512. We use Adam [15] optimizer with a learning rate of 3 10 5 and train for 1000 steps, selecting the best step based on the validation set performance. We use a dropout probability of 0.2 during training.