Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Frequency-Aware Token Reduction for Efficient Vision Transformer

Authors: DongJae Lee, Jiwan Hur, Jaehyun Choi, Jaemyung Yu, Junmo Kim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments and analysis, we demonstrate that our approach significantly improves accuracy while reducing computational overhead and mitigating rank collapsing and over smoothing. Furthermore, we analyze the previous methods, shedding light on their implicit frequency characteristics and limitations.
Researcher Affiliation	Collaboration	1KAIST 2NAVER AI Lab
Pseudocode	No	The paper describes methods and formulas but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code	Yes	The code is available in https://github.com/jhtwosun/frequency-aware-token-pruning.
Open Datasets	Yes	We conduct experiments on Image Net-1K [5] and compare our results with state-of-the-art methods.
Dataset Splits	Yes	To validate the effectiveness of our proposed method, we conduct experiments on Image Net-1K [5] and compare our results with state-of-the-art methods. Specifically, we utilize pretrained models on Image Net-1K and fine-tune them for 30 epochs.
Hardware Specification	Yes	All experiments are conducted in 8 NVIDIA RTX 4090 with mixed precision.
Software Dependencies	No	We use standard cross-entropy and self-distillation for loss term from an unpruned model, following [14, 21, 27]. For DC tokens, we additionally utilize positional embedding terms. For extra distillation from the large teacher model, we follow [12].
Experiment Setup	Yes	We conduct the experiments with Adam W with learning rate 0.0001 and weight decay 2 10 5. The batch size is set to 128 per GPU. For other hyperparameters and data augmentations, we follow Dei T, except warmup epochs and stochastic depth, which we set as 0. For EMA, we set the decay factor to 0.9998. Additionally, we utilize the EMA model as distillation.