PRODIGY: Enabling In-context Learning Over Graphs
Authors: Qian Huang, Hongyu Ren, Peng Chen, Gregor Kržmanc, Daniel Zeng, Percy S. Liang, Jure Leskovec
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide empirical evidence of the effectiveness of our framework by showcasing its strong in-context learning performance on tasks involving citation networks and knowledge graphs. Our approach outperforms the in-context learning accuracy of contrastive pretraining baselines with hard-coded adaptation by 18% on average across all setups. Moreover, it also outperforms standard finetuning with limited data by 33% on average with in-context learning. |
| Researcher Affiliation | Academia | Qian Huang1 qhwang@cs.stanford.edu Hongyu Ren1 hyren@cs.stanford.edu Peng Chen1 pengc@stanford.edu Gregor Kržmanc2 gregor.krzmanc@ijs.si Daniel Zeng1 dzeng@cs.stanford.edu Percy Liang1 pliang@cs.stanford.edu Jure Leskovec1 jure@cs.stanford.edu 1Stanford University 2 University of Ljubljana |
| Pseudocode | No | The paper describes the algorithmic steps and architecture in detail using prose and mathematical equations, but it does not include a dedicated pseudocode block or algorithm box. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of source code for the described methodology. |
| Open Datasets | Yes | For pretraining, we use two datasets: MAG240M [5], a large scale citation network with 122 million nodes and 1.3 billion edges; and Wiki, a knowledge graph (KG) constructed from Wikipedia [22] with 4.8 million nodes and 5.9 million edges. |
| Dataset Splits | Yes | Each of the downstream classification datasets has its original train, validation, and test splits. |
| Hardware Specification | Yes | We use one NVIDIA A100-SXM4-80GB GPU for all our experiments. |
| Software Dependencies | No | The paper mentions the use of pretrained language models like 'RoBERTa [12] base model' and 'MPNet [15]' but does not provide specific version numbers for these or any other software dependencies such as programming languages or libraries. |
| Experiment Setup | Yes | Our pretraining setup included a model with an input dimension of 768 and an embedding dimension of 256, batch size of 1, and the Adam W optimizer with a learning rate of 1 10 3 and weight decay of 1 10 3, a pretraining task with 30 ways, 3 shots, and 4 queries per task, and checkpointing every 500 steps. This consistent configuration was applied across all the methods for fair comparison. To augment the data, we use Drop Node and Mask Node augmentations with a probability of 0.5 per node for each method. |