Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems

Authors: Hancheng Ye, Zhengqi Gao, Mingyuan Ma, Qinsi Wang, Yuzhe Fu, Ming-Yu Chung, Yueqian Lin, Zhijian Liu, Jianyi Zhang, Danyang Zhuo, Yiran Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental KVCOMM achieves over 70% reuse rate across diverse multiagent workloads, including retrieval-augmented generation, math reasoning, and collaborative coding tasks, all without quality degradation. Particularly, when each fully-connected agent receives 1K input tokens with 512 prefix tokens and 512 output tokens under a five-agent setting, KVCOMM achieves up to 7.8 speedup compared to the standard prefill pipeline, reducing TTFT from 430ms to 55ms. Code is available at https://github.com/Fast MAS/KVCOMM.
Researcher Affiliation Collaboration Hancheng Ye1, Zhengqi Gao2, Mingyuan Ma1, Qinsi Wang1, Yuzhe Fu1, Ming-Yu Chung1, Yueqian Lin1, Zhijian Liu3, Jianyi Zhang1, Danyang Zhuo1, Yiran Chen1 1Duke University, 2MIT, 3NVIDIA EMAIL
Pseudocode Yes The specific details of KVCOMM are shown in Algorithm 1.
Open Source Code Yes Code is available at https://github.com/Fast MAS/KVCOMM.
Open Datasets Yes Benchmark Datasets. We assess RAG performance using MMLU [13], math reasoning with GSM8K [7], and programming capability via Human Eval [4].
Dataset Splits No The paper mentions using well-known benchmarks (MMLU, GSM8K, Human Eval) but does not explicitly detail the specific train/test/validation splits used for its experiments beyond referencing these standard datasets. While these benchmarks inherently have established splits, the paper does not specify how they were applied in this work.
Hardware Specification Yes Implementation Details. Experiments are executed on a single NVIDIA H100 GPU.
Software Dependencies No The paper mentions using "Hugging Face's framework" for open-source models but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The maximum generation length is uniformly set to 512 tokens, with hyperparameters selected as γ = 0.3 and anchor pool size V = 20. Further implementation specifics are detailed in Appendix 6.3.