Open Graph Benchmark: Datasets for Machine Learning on Graphs
Authors: Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science, 5Chemistry, Stanford University 2Department of Computer Science, TU Dortmund University 3Department of Biomedical Informatics, Harvard University 4Microsoft Research, Redmond |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu. [...] both of which are provided by our OGB Python package (https://github. com/snap-stanford/ogb). |
| Open Datasets | Yes | OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu. All the datasets are constructed by ourselves, except for ogbn-products, ogbg-molpcba, and ogbg-molhiv, whose graphs and target labels are adopted from Chiang et al. [17] and Wu et al. [92]. |
| Dataset Splits | Yes | For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. Specifically, we sort the products according to their sales ranking and use the top 8% for training, next top 2% for validation, and the rest for testing. [...] Table 1: Summary of currently-available OGB datasets. [...] Split Scheme Ratio |
| Hardware Specification | Yes | Table 3: Results for ogbn-products. Requires a GPU with 33GB of memory. [...] Table 4: Results for ogbl-wikikg. Requires a GPU with 48GB of memory. [...] the upper-half baselines are implemented on a single commodity GPU with 11GB memory, while the bottom-half baselines are implemented on a high-end GPU with 48GB memory. |
| Software Dependencies | No | The paper mentions software like PYTORCH, PYTORCH GEOMETRIC, and DEEP GRAPH LIBRARY but does not provide specific version numbers for these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | All models are trained with a fixed hidden dimensionality of 256, a fixed number of three layers, and a tuned dropout ratio 2 {0.0, 0.5}. [...] We use 5-layer GNNs, average graph pooling, a hidden dimensionality of 300, and a tuned dropout ratio of {0, 0.5}. |