Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FlowPrune: Accelerating Attention Flow Calculation by Pruning Flow Network

Authors: Shuo Xu, Yu Chen, Shuxia Lin, Xin Geng, Xu Yang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on LLa MA and LLa VA to evaluate the robustness and effectiveness of Flow Prune. Our results show that Flow Prune achieves high agreement with the original attention flow in both absolute and relative error metrics, as well as in identifying influential input tokens. Finally, case studies in both NLP and vision domains demonstrate that Flow Prune produces consistent interpretability outcomes as the original Attention Flow, validating its practical utility.
Researcher Affiliation Academia Shuo Xu Southeast University EMAIL Yu Chen Southeast University EMAIL Shuxia Lin Southeast University EMAIL Xin Geng Southeast University EMAIL Xu Yang Southeast University EMAIL
Pseudocode No A.4 Layer Selection via Dynamic Programming for Flow Prune To identify which attention layers to compress in the Flow Prune framework, we formulate the selection process as a dynamic programming (DP) problem that aims to preserve critical regions in the attention flow graph while reducing computational cost. ... We define the dynamic programming state as: ... The recurrence relations are: ... The boundary conditions are: ... The optimal value is given by: ...
Open Source Code Yes All our code is publicly available, and you can find it at https://github.com/ATMxsp01/Flow Prune.
Open Datasets Yes The datasets used by these models to generate attention maps are GSM8K [37] and OKVQA [38], respectively. ... In this case, we implement the experiments on MRPC (Microsoft Research Paraphrase Corpus) [40] dataset with a 12-layer BERT model... we use the Deit-Small [41] model and implement the experiments on ILSVRC2012 [42], an image classification dataset. ... Using the QNLI [40] dataset from GLUE...
Dataset Splits No The datasets used by these models to generate attention maps are GSM8K [37] and OKVQA [38], respectively. For each model, we provided 1,000 inputs and randomly selected 100 token pairs from each generated attention map (including those compressed using Flow Prune) to calculate the corresponding results.
Hardware Specification Yes All our experiments were conducted on a single server equipped with 2 AMD EPYC 7453 28-Core Processors and 4 NVIDIA RTX A6000 GPUs.
Software Dependencies No The paper does not explicitly provide specific software dependencies with version numbers.
Experiment Setup Yes We experiment with seven compression configurations: the full model (32 layers), an extreme compression setting (3 layers), and five intermediate retain rates of 80%, 75%, 50%, 30%, and 25%, which correspond to 26, 24, 16, 10, and 8 layers, respectively. The edge pruning threshold is set to 1 10 6 throughout all experiments.