Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation

Authors: Haoran Luo, Haihong E, Guanting Chen, Yandan Zheng, Xiaobao Wu, Yikai Guo, Qika Lin, Yu Feng, Zemin Kuang, Meina Song, Yifan Zhu, Anh Tuan Luu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across medicine, agriculture, computer science, and law demonstrate that Hyper Graph RAG outperforms both standard RAG and previous graph-based RAG methods in answer accuracy, retrieval efficiency, and generation quality. Our data and code are publicly available. To validate the effectiveness, we conduct experiments in multiple knowledge-intensive domains [7], including medicine, agriculture, computer science, and law.
Researcher Affiliation	Collaboration	1Beijing University of Posts and Telecommunications 2Nanyang Technological University 3Beijing Institute of Computer Technology and Application 4National University of Singapore 5China Mobile Research Institute 6Beijing Anzhen Hospital, Capital Medical University
Pseudocode	Yes	As shown in Algorithm 1, we first construct a knowledge hypergraph from raw documents via LLM-based extraction of n-ary relational facts. Algorithm 2 Hypergraph Retrieval and Generation Require: Query q, knowledge hypergraph GH = (V, EH) Ensure: Final answer y
Open Source Code	Yes	Our data and code are publicly available1. 1 https://github.com/LHRLAB/Hyper Graph RAG
Open Datasets	Yes	To evaluate the performance of Hyper Graph RAG across multiple domains, we select four knowledge contexts from Ultra Domain [19], as used in Light RAG [7]: Agriculture, Computer Science (CS), Legal, and a mixed domain (Mix). In addition, we include the latest international hypertension guidelines [16] as the foundational knowledge for the Medicine domain.
Dataset Splits	Yes	Specifically, for each domain, we sample a total of 512 questions, consisting of: Binary Source (256 samples): 128 facts are selected via 1-hop traversal, 64 facts via 2-hop traversal, 64 facts via 3-hop traversal. ... N-ary Source (256 samples): 128 facts are sampled via 1-hop traversal, 64 facts via 2-hop traversal, 64 facts via 3-hop traversal.
Hardware Specification	Yes	All experiments were conducted on a server with an 80-core CPU and 512GB RAM.
Software Dependencies	Yes	We use Open AI s GPT-4o-mini for extraction and generation, and text-embedding-3-small for vector.
Experiment Setup	Yes	During retrieval, we set the following parameters: entity retrieval k V = 60, τV = 50; hyperedge retrieval k H = 60, τH = 5; and chunk retrieval k C = 5, τC = 0.5. All methods are run using 16 parallel cores and the same generation model (GPT-4o-mini) with temperature 1.0 and a maximum generation length of 32k tokens.