Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Microstructures and Accuracy of Graph Recall by Large Language Models
Authors: Yanbang Wang, Hejie Cui, Jon Kleinberg
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we perform the first systematical study of graph recall by LLMs, investigating the accuracy and biased microstructures (local subgraph patterns) in their recall. |
| Researcher Affiliation | Academia | Yanbang Wang Cornell University EMAIL Hejie Cui Stanford University EMAIL Jon Kleinberg Cornell University EMAIL |
| Pseudocode | No | The paper describes experimental protocols in narrative form and figures, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and data can be downloaded at: https://github.com/Abel0828/llm-graph-recall. |
| Open Datasets | Yes | We create five graph datasets from the following application domains. (1) Co-authorship: DBLP (1995-2005); (2) Social network: Facebook [27]; (3) Geological network: CA road; (4) Protein interactions: Reactome [16]; (5) Erd os Rényi graph: as in [18]. |
| Dataset Splits | No | The paper mentions datasets and splits for train/test evaluation (e.g. 20% edges removed for link prediction), but does not explicitly state specific validation set splits, percentages, or methodology. |
| Hardware Specification | Yes | For Llama Family models, we use the open-sourced models meta-llama/Llama-2-7b-hf and meta-llama/Llama-2-13b-hf on Hugging Face, tuned on two Quadro RTX 8000 GPUs with 48 GB of RAM. |
| Software Dependencies | No | The paper lists the LLM models and APIs used (e.g., GPT-3.5, GPT-4, Gemini-Pro, Llama 2), but does not provide specific version numbers for ancillary software dependencies such as programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We use zero-shot prompting with moderate formatting instructions for answers. |