reproducibilityindex.ai

Open Graph Benchmark: Datasets for Machine Learning on Graphs

Authors: Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments suggest that OGB datasets present signiﬁcant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research.
Researcher Affiliation	Collaboration	1Department of Computer Science, 5Chemistry, Stanford University 2Department of Computer Science, TU Dortmund University 3Department of Biomedical Informatics, Harvard University 4Microsoft Research, Redmond
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu. [...] both of which are provided by our OGB Python package (https://github. com/snap-stanford/ogb).
Open Datasets	Yes	OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu. All the datasets are constructed by ourselves, except for ogbn-products, ogbg-molpcba, and ogbg-molhiv, whose graphs and target labels are adopted from Chiang et al. [17] and Wu et al. [92].
Dataset Splits	Yes	For each dataset, we provide a uniﬁed evaluation protocol using meaningful application-speciﬁc data splits and evaluation metrics. Speciﬁcally, we sort the products according to their sales ranking and use the top 8% for training, next top 2% for validation, and the rest for testing. [...] Table 1: Summary of currently-available OGB datasets. [...] Split Scheme Ratio
Hardware Specification	Yes	Table 3: Results for ogbn-products. Requires a GPU with 33GB of memory. [...] Table 4: Results for ogbl-wikikg. Requires a GPU with 48GB of memory. [...] the upper-half baselines are implemented on a single commodity GPU with 11GB memory, while the bottom-half baselines are implemented on a high-end GPU with 48GB memory.
Software Dependencies	No	The paper mentions software like PYTORCH, PYTORCH GEOMETRIC, and DEEP GRAPH LIBRARY but does not provide specific version numbers for these dependencies, which is required for reproducibility.
Experiment Setup	Yes	All models are trained with a ﬁxed hidden dimensionality of 256, a ﬁxed number of three layers, and a tuned dropout ratio 2 {0.0, 0.5}. [...] We use 5-layer GNNs, average graph pooling, a hidden dimensionality of 300, and a tuned dropout ratio of {0, 0.5}.