Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment

Authors: Yuxing Lu, Wei Wu, Xukai Zhao, Rui Peng, Jinzhuo Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 1,200 Pub Med articles from three different domains demonstrate the effectiveness of KARMA in knowledge graph enrichment, with the identification of up to 38,230 new entities while achieving 83.1% LLM-verified correctness and reducing conflict edges by 18.6% through multi-layer assessments.
Researcher Affiliation	Academia	Yuxing Lu1,2, Wei Wu1, Xukai Zhao3, Rui Peng1, Jinzhuo Wang1 1 Department of Big Data and Biomedical AI, Peking University, Beijing, China 2 Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, USA 3 School of Architecture, Tsinghua University, Beijing, China Corresponding author: EMAIL
Pseudocode	Yes	B.3 Ingestion Agent (IA) Prompt Title: IA_Prompt Role Description: You are a clinical expert. Your responsibility is to: 1. Retrieve raw publications from designated sources (e.g., Pub Med, internal repositories). 2. Convert various file formats (PDF, HTML, XML) into a consistent normalized text format. 3. Extract metadata such as the title, authors, journal/conference name, publication date, and unique identifiers (DOI, Pub Med ID). System Instruction: Input: Raw document payload or a path/URL to the document, plus minimal metadata (if available). Output: A JSON structure with two main fields: 1. "metadata": {title, authors, journal, pub_date, doi, pmid, etc.} 2. "content": A single string or a structured array containing the full text, preserving headings or major sections if possible. Key Requirements: Handle OCR artifacts if the PDF is scanned (e.g., correct typical OCR errors where possible). Normalize non-ASCII characters (greek letters, special symbols) to ASCII or minimal La Te X markup when relevant (e.g., \alpha). If certain fields cannot be extracted, leave them as empty or "N/A" but do not remove the key from the JSON. Error Handling: In case of partial or unreadable text, mark the corrupted portions with placeholders (e.g., [UNREADABLE] ). If the document is locked or inaccessible, set an error flag in the output JSON. LLM Prompt Template (Illustrative Example): [System Role: Ingestion Agent] You will receive a raw publication in PDF or HTML format. 1. Extract all available metadata: Title, Authors, Date, Journal/Source, PMID, DOI. 2. Convert the text to ASCII or minimal La Te X. 3. Provide a JSON output with keys: {"metadata": {...}, "content": "..." }. 4. If any portion of the text is unreadable, replace it with "[UNREADABLE]". Sample Input: pdf_document: "Binary PDF data...", doi: "10.1000/j.jmb.2022.07.123" Sample Output: { "metadata": {"title": "Novel Anti-viral Therapy", "authors": ["Jane Doe"], "content": "Introduction n Recent advances in... Methods n We tested..." }
Open Source Code	Yes	The paper has deposit the code in Supplementary Materials, and will opensource the code once accepted.
Open Datasets	Yes	We curate scientific publications from Pub Med [27] across three primary domains: the Genomics Corpus, which includes 720 papers focused on gene variants, regulatory elements, and sequencing studies; the Proteomics Corpus, comprising 360 papers related to protein structures, functions, and protein-interaction networks; and the Metabolomics Corpus, containing 120 papers discussing metabolic pathways, metabolite profiling, and clinical applications. All articles are stored in PDF format and processed by the Ingestion Agent within KARMA.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits for the KARMA framework's components. It describes the datasets used for evaluation but not how these were split for training purposes of the underlying models within the framework.
Hardware Specification	No	The paper mentions using LLMs as a backbone for KARMA's multi-agent knowledge graph enrichment pipeline using their APIs, implying external cloud-based resources. It also provides a "Cost Analysis" section discussing prompt tokens, completion tokens, and processing time (Figure 3), but no specific hardware details (like GPU/CPU models or memory) used by the authors to run their experiments are provided.
Software Dependencies	Yes	We evaluate three general-purpose LLMs as the backbone for KARMA s multi-agent knowledge graph enrichment pipeline using their APIs. GLM-4 [7]: An open-source 9B-parameter model... GPT-4o [1]: A proprietary multimodal model... Deep Seek-v3 [15]: An open-source 37-billion-activated-parameter mixture-of-experts (Mo E) model...
Experiment Setup	Yes	Reader Agents (RA): ...RA discards segments if R(sj) < δ, where δ is a domain-calibrated threshold. Relationship Extraction Agents (REA): ...We select any relationship r for which p(r\|ˆei, ˆej) θr and form a triplet (ˆei, r, ˆej). Evaluator Agents (EA): ...integrate(t) = 1, if C(t)+Cl(t)+R(t) / 3 >= Θ 0, otherwise.