Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Authors: Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across four open-source MLLMs and four long-video and streaming-video benchmarks, Infini Pot-V cuts peak GPU memory by up to 94%, sustains real-time generation, and matches or surpasses full-cache accuracy even in multi-turn dialogues. |
| Researcher Affiliation | Collaboration | Minsoo Kim1 Kyuhong Shim2 Jungwook Choi1 Simyung Chang3 1Hanyang University 2Sungkyunkwan University 3Qualcomm AI Research, Qualcomm Korea YH EMAIL EMAIL EMAIL |
| Pseudocode | Yes | Algorithm 1 Continual KV cache Compression (CKV) with Infini Pot-V |
| Open Source Code | No | The code and data involve proprietary components that are subject to company policy restrictions, and thus cannot be publicly released. However, detailed implementation settings and reproduction instructions are thoroughly documented in the main text and Appendix A and B. |
| Open Datasets | Yes | For OVU, we utilize representative multiple-choice based long video understanding benchmarks (ranging from 3 minutes to over 2 hours): Video MME [11], MLVU [54], Long Video Bench (LVB) [45], and Ego Schema [28]. For SVU, we employ RVS-Ego/Movie streaming video QA benchmark [49], featuring open-ended questions paired with timestamps, and evaluate the answers using GPT-3.5-turbo-0125 following [49, 35]. We further extend multiple-choice based SVU evaluation, OVO-Bench [23] and Streaming Bench [25]. |
| Dataset Splits | Yes | For MLVU and Ego Schema, we use the development sets for evaluation. For Video-MME, we report results without subtitles version. ... Video MME |M| = α|Ta R| + (1 α)|Va N|, |M| = 6K ... Short (-3 min) ... Medium (3 30 min) ... Long (30 120 min) |
| Hardware Specification | Yes | Measured on A100 80GB single GPU. ... conducted on a single NVIDIA A100-80GB GPU using Py Torch. ... We evaluate our CKV framework combining continual KV compression with Ta R and Va N scoring on the NVIDIA Jetson AGX Orin [19] using Qwen-2.5-VL-3B and 10-minute Streaming Bench [25] videos (0.2 1 FPS). |
| Software Dependencies | No | The experiments averaged over five runs with three warmup iterations, compare the performance of memory-unconstrained (Case 1, 2) and memory-constrained (Case 3) approaches across various context lengths. For memory-unconstrained methods, we observe a linear growth in memory requirements, escalating from 21.29 GB at 5K tokens to 79.38 GB at 100K tokens, accompanied by a proportional increase in TTFT from 0.98 to 3.27 seconds. |
| Experiment Setup | Yes | For Qwen-2-VL [43], which supports dynamic image resizing based on the number of frames, we use the hyper-parameter configuration reported to yield the best performance in their original work: FPS_MAX_FRAMES = 768, VIDEO_MIN_PIXEL = 128 28 28 and VIDEO_MAX_PIXEL = 768 28 28. ... We standardize the hyperparameter values at α = 0.5, r = 0.125, and |M|/|C| = 0.75 for all main experimental results when evaluating Infini Pot-V. |