Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DToMA: Training-free Dynamic Token MAnipulation for Long Video Understanding

Authors: Bowen Yuan, Sisi You, Bing-Kun Bao

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on 6 long video understanding benchmarks show that DTo MA enhances both efficiency and comprehension, outperforming stateof-the-art methods and generalizing well across 3 Video LLM architectures and sizes.
Researcher Affiliation	Academia	Bowen Yuan1 , Sisi You1,3 , Bing-Kun Bao1,2 1Nanjing University of Posts and Telecommunications, 2Pengcheng Laboratory, 3State Key Laboratory of Tibetan Intelligence. EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Token Reorganization Strategy
Open Source Code	Yes	Code is available at https://github.com/yuanrr/DTo MA.
Open Datasets	Yes	We conducted evaluations of our method on 6 long video understanding benchmarks, including Video MME [Fu et al., 2024a], Long Video Bench [Wu et al., 2024], Ego Schema [Mangalam et al., 2023], MLVU [Zhou et al., 2024], NEx TQA [Xiao et al., 2021], and Perception Test [Patraucean et al., 2024].
Dataset Splits	No	Following evaluation tool LMMs-Eval [Zhang et al., 2024a], we perform standardized evaluation settings and metrics, i.e., accuracy, on each benchmark. The paper refers to 'standardized evaluation settings' and external benchmarks but does not explicitly provide specific split percentages, sample counts, or detailed splitting methodology within its main text.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It mentions the models and architectures used (e.g., LLa VA-Video-7B, Sig LIP, Qwen2) but not the underlying computational hardware.
Software Dependencies	No	The paper mentions using LLa VA-Video-7B, Sig LIP, and Qwen2 as core components but does not provide specific version numbers for these or other ancillary software libraries or programming languages required for replication.
Experiment Setup	Yes	For DTo MA, the selected layer r1 = 3, r2 [12, 18], r3 = 21. For TKR, following optimal design [Du et al., 2024] for Sig LIP, we use 2 2 pooling for keyframes, while 3 3 for coarse non-keyframes. Token budget B is pre-defined according to experimental requirements, and token compression ratio is self-adaptive according to B. With no specifically stated, we set m = S = N/4. For V-Inj, we set threshold G = 0.75, factor α = 0.25.