Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Talk like a Graph: Encoding Graphs for Large Language Models
Authors: Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we perform the ο¬rst comprehensive study of encoding graph-structured data as text for consumption by LLMs. We show that LLM performance on graph reasoning tasks varies on three fundamental levels: (1) the graph encoding method, (2) the nature of the graph task itself, and (3) interestingly, the very structure of the graph considered. These novel results provide valuable insight on strategies for encoding graphs as text. Using these insights we illustrate how the correct choice of encoders can boost performance on graph reasoning tasks inside LLMs by 4.8% to 61.8%, depending on the task. |
| Researcher Affiliation | Industry | Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi Google Research EMAIL |
| Pseudocode | No | The paper describes methods and processes in text and with diagrams, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The code to generate the data is available at https://github.com/google-research/google-research/tree/master/graphqa. We are committed to open-sourcing both our code and data upon the acceptance of our paper. |
| Open Datasets | Yes | Graph QA is distinguished by using graphs with much more varied and realistic graph structure than has previously been studied with LLMs1. 1The code to generate the data is available at https://github.com/google-research/google-research/tree/master/graphqa. |
| Dataset Splits | No | The paper mentions generating graphs and using 'few-shot examples' for prompting, but does not provide specific details on how the generated Graph QA data is split into training, validation, and test sets, or specify cross-validation methods. |
| Hardware Specification | Yes | For our experiments, we used Pa LM 62B and Pa LM 2 (various sizes) served on a 4 4 TPU v4 architecture. |
| Software Dependencies | Yes | We used the Network X library (Hagberg et al., 2008) to generate the random graphs and to ο¬nd the answers to the graph tasks. |
| Experiment Setup | Yes | The decoding temperature was set to zero. |