Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ThinK: Thinner Key Cache by Query-Driven Pruning
Authors: Yuhui Xu, Zhanming Jie, Hanze Dong, Lei Wang, Xudong Lu, Aojun Zhou, Amrita Saha, Caiming Xiong, Doyen Sahoo
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations on the LLa MA and Mistral models across various long-sequence datasets verified the efficiency of THINK. |
| Researcher Affiliation | Collaboration | 1Salesforce AI Research 2 The Chinese University of Hong Kong |
| Pseudocode | No | The paper describes methods using mathematical formulations and prose. It does not contain an explicitly labeled 'Pseudocode' or 'Algorithm' section, nor does it present structured, step-by-step procedures in a code-like format. |
| Open Source Code | Yes | Our code has been made available at https://github.com/Salesforce AIResearch/Thin K. |
| Open Datasets | Yes | We evaluate our proposed method against state-of-the-art KV cache compression methods on two widely recognized benchmarks: Long Bench and Needle-in-a-Haystack. Long Bench (Bai et al., 2023) is designed to comprehensively assess the long context understanding capabilities of LLMs... Needle-in-a-Haystack (Kamradt, 2023) is a recently developed benchmark... |
| Dataset Splits | No | The paper evaluates models on established benchmarks like Long Bench and Needle-in-a-Haystack, but does not provide specific training/test/validation dataset splits, percentages, or sample counts used for these benchmarks within the text. |
| Hardware Specification | Yes | All the experiments are conducted using NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper mentions using specific LLM models (LLa MA-2/3, Mistral) accessible via Hugging Face, but does not provide specific version numbers for any software dependencies like Hugging Face, PyTorch, Python, or CUDA. |
| Experiment Setup | Yes | For instance, when comparing Snap KV and Snap KV integrated with THINK, we used a maximum pooling kernel size of 7 and an observation window size of 32, maintaining the same KV-size for both configurations. ... We generate synthetic workloads with an input prompt length of 160 and an output length of 338. We set a batch size 300 for both KIVI and our method. |