Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Inductive Representation Learning on Large Graphs
Authors: Will Hamilton, Zhitao Ying, Jure Leskovec
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions. |
| Researcher Affiliation | Academia | William L. Hamilton EMAIL Rex Ying EMAIL Jure Leskovec EMAIL Department of Computer Science Stanford University Stanford, CA, 94305 |
| Pseudocode | Yes | Algorithm 1: Graph SAGE embedding generation (i.e., forward propagation) algorithm |
| Open Source Code | Yes | Code and links to the datasets: http://snap.stanford.edu/graphsage/ |
| Open Datasets | Yes | We use an undirected citation graph dataset derived from the Thomson Reuters Web of Science Core Collection, corresponding to all papers in six biology-related fields for the years 2000-2005. ... We constructed a graph dataset from Reddit posts made in the month of September, 2014. ... Code and links to the datasets: http://snap.stanford.edu/graphsage/ |
| Dataset Splits | Yes | We train all the algorithms on the 2000-2004 data and use the 2005 data for testing (with 30% used for validation). ... We use the first 20 days for training and the remaining days for testing (with 30% used for validation). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions that models were implemented in TensorFlow. |
| Software Dependencies | No | All models were implemented in Tensor Flow [1] with the Adam optimizer [16] (except Deep Walk, which performed better with the vanilla gradient descent optimizer). We used Gen Sim word2vec implementation [30] and GloVe Common Crawl word vectors [27]. However, specific version numbers for these software components are not explicitly provided. |
| Experiment Setup | Yes | For all the Graph SAGE variants we used rectified linear units as the non-linearity and set K = 2 with neighborhood sample sizes S1 = 25 and S2 = 10 (see Section 4.4 for sensitivity analyses). |