Open Graph Benchmark: Datasets for Machine Learning on Graphs

Authors: Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research.
Researcher Affiliation Collaboration 1Department of Computer Science, 5Chemistry, Stanford University 2Department of Computer Science, TU Dortmund University 3Department of Biomedical Informatics, Harvard University 4Microsoft Research, Redmond
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu. [...] both of which are provided by our OGB Python package (https://github. com/snap-stanford/ogb).
Open Datasets Yes OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu. All the datasets are constructed by ourselves, except for ogbn-products, ogbg-molpcba, and ogbg-molhiv, whose graphs and target labels are adopted from Chiang et al. [17] and Wu et al. [92].
Dataset Splits Yes For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. Specifically, we sort the products according to their sales ranking and use the top 8% for training, next top 2% for validation, and the rest for testing. [...] Table 1: Summary of currently-available OGB datasets. [...] Split Scheme Ratio
Hardware Specification Yes Table 3: Results for ogbn-products. Requires a GPU with 33GB of memory. [...] Table 4: Results for ogbl-wikikg. Requires a GPU with 48GB of memory. [...] the upper-half baselines are implemented on a single commodity GPU with 11GB memory, while the bottom-half baselines are implemented on a high-end GPU with 48GB memory.
Software Dependencies No The paper mentions software like PYTORCH, PYTORCH GEOMETRIC, and DEEP GRAPH LIBRARY but does not provide specific version numbers for these dependencies, which is required for reproducibility.
Experiment Setup Yes All models are trained with a fixed hidden dimensionality of 256, a fixed number of three layers, and a tuned dropout ratio 2 {0.0, 0.5}. [...] We use 5-layer GNNs, average graph pooling, a hidden dimensionality of 300, and a tuned dropout ratio of {0, 0.5}.