Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Node Feature Extraction by Self-Supervised Multi-scale Neighborhood Prediction
Authors: Eli Chien, Wei-Cheng Chang, Cho-Jui Hsieh, Hsiang-Fu Yu, Jiong Zhang, Olgica Milenkovic, Inderjit S Dhillon
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the superior performance of GIANT over the standard GNN pipeline on Open Graph Benchmark datasets: For example, we improve the accuracy of the top-ranked method GAMLP from 68.25% to 69.67%, SGC from 63.29% to 66.10% and MLP from 47.24% to 61.10% on the ogbn-papers100M dataset by leveraging GIANT. Our implementation is public available. ... 5 EXPERIMENTS Evaluation Datasets. We consider node classification as our downstream task and evaluate GIANT on three large-scale OGB datasets (Hu et al., 2020a) with available raw text: ogbn-arxiv, ogbn-products, and ogbn-papers100M. The parameters of these datasets are given in Table 1 and detailed descriptions are available in the Appendix E.1. Following the OGB benchmarking protocol, we report the average test accuracy and the corresponding standard deviation by repeating 3 runs of each downstream GNN model. |
| Researcher Affiliation | Collaboration | Eli Chien University of Illinois Urbana-Champaign, USA EMAIL Wei-Cheng Chang Amazon, USA EMAIL Cho-Jui Hsieh University of California, Los Angeles, USA EMAIL Hsiang-Fu Yu, Jiong Zhang Amazon, USA EMAIL Olgica Milenkovic University of Illinois Urbana-Champaign, USA EMAIL Inderjit S. Dhillon Amazon, USA EMAIL |
| Pseudocode | No | No explicitly labeled pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | Our implementation is public available1. 1https://github.com/amzn/pecos/tree/mainline/examples/giant-xrt ... We provide our code in the supplementary material along with an easy-to-follow description and package dependency for reproducibility. Our experimental setting is stated in Section 5 and details pertaining to hyperparameters and computational environment are described in the Appendix. All tested methods are integrated in our code: https://github.com/amzn/pecos/tree/ mainline/examples/giant-xrt. |
| Open Datasets | Yes | Table 1: Basic statistics of the OGB benchmark datasets (Hu et al., 2020a). #Nodes #Edges Avg. Node Degree Split ratio (%) Metric ogbn-arxiv 169,343 1,166,243 13.7 54/18/28 Accuracy ogbn-products 2,449,029 61,859,140 50.5 8/2/90 Accuracy ogbn-papers100M 111,059,956 1,615,685,872 29.1 78/8/14 Accuracy ... E.1 DATASETS In this work, we choose node classification as our downstream task to focus. We conduct experiments on three large-scale datasets, ogbn-arxiv, ogbn-products and ogbn-papers100M as these are the only three datasets with raw text available in OGB. |
| Dataset Splits | Yes | Table 1: Basic statistics of the OGB benchmark datasets (Hu et al., 2020a). #Nodes #Edges Avg. Node Degree Split ratio (%) Metric ogbn-arxiv 169,343 1,166,243 13.7 54/18/28 Accuracy ogbn-products 2,449,029 61,859,140 50.5 8/2/90 Accuracy ogbn-papers100M 111,059,956 1,615,685,872 29.1 78/8/14 Accuracy |
| Hardware Specification | Yes | E.5 COMPUTATIONAL ENVIRONMENT All experiments are conducted on the AWS p3dn.24xlarge instance, consisting of 96 Intel Xeon CPUs with 768 GB of RAM and 8 Nvidia V100 GPUs with 32 GB of memory each. |
| Software Dependencies | No | The paper mentions 'Pytorch Geometric Library (Fey & Lenssen, 2019)' and 'bert-base-uncased downloaded from Hugging Face' but does not specify exact version numbers for these software dependencies or any other core libraries. |
| Experiment Setup | Yes | E.3 HYPER-PARAMETERS OF GIANT-XRT AND BERT+LP In Table 4, we outline the pre-training hyper-parameter of GIANT-XRT for all three OGB benchmark datasets. We mostly follow the convention of XRTransformer (Zhang et al., 2021a) to set the hyper-parameters. ... E.4 HYPER-PARAMETERS OF DOWNSTREAM METHODS For the downstream models, we optimize the learning rate within {0.01, 0.001} for all models. For MLP, Graph SAGE and Graph SAINT, we optimize the number of layers within {1, 3}. |