reproducibilityindex.ai

Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond

Authors: Oleg Platonov, Denis Kuznedelev, Artem Babenko, Liudmila Prokhorenkova

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we first characterize some existing graph datasets in terms of homophily and LI to see which structural patterns are currently covered. Then, we show that LI, despite being a very simple graph characteristic, much better agrees with GNN performance than homophily.5
Researcher Affiliation	Collaboration	Oleg Platonov HSE University Yandex Research Denis Kuznedelev Yandex Research Skoltech Artem Babenko Yandex Research Liudmila Prokhorenkova Yandex Research
Pseudocode	No	The paper describes methods and theoretical concepts but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Implementations of all the graph measures discussed in the paper and examples of their usage are provided in this Colab notebook.
Open Datasets	Yes	cora, citeseer, and pubmed [6, 25, 38, 27, 46] are three classic paper citation network benchmarks. Ogbn-arxiv and Ogbn-products [14] are two datasets from the recently proposed Open Graph Benchmark.
Dataset Splits	Yes	For each synthetic graph, we create 10 random 50%/25%/25% train/validation/test splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper states 'Our models are implemented using Py Torch [31] and DGL [44]' but does not specify version numbers for these software dependencies or the programming language.
Experiment Setup	Yes	For all models, we add a two-layer MLP after every graph neighborhood aggregation layer and further augment all models with skip connections [11], layer normalization [2], and GELU activation functions [12]. For all models, we use two graph neighborhood aggregation layers and hidden dimension of 512. We use Adam [15] optimizer with a learning rate of 3 10 5 and train for 1000 steps, selecting the best step based on the validation set performance. We use a dropout probability of 0.2 during training.