Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Frequency-Aware Token Reduction for Efficient Vision Transformer

Authors: DongJae Lee, Jiwan Hur, Jaehyun Choi, Jaemyung Yu, Junmo Kim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments and analysis, we demonstrate that our approach significantly improves accuracy while reducing computational overhead and mitigating rank collapsing and over smoothing. Furthermore, we analyze the previous methods, shedding light on their implicit frequency characteristics and limitations.
Researcher Affiliation Collaboration 1KAIST 2NAVER AI Lab
Pseudocode No The paper describes methods and formulas but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks with structured steps.
Open Source Code Yes The code is available in https://github.com/jhtwosun/frequency-aware-token-pruning.
Open Datasets Yes We conduct experiments on Image Net-1K [5] and compare our results with state-of-the-art methods.
Dataset Splits Yes To validate the effectiveness of our proposed method, we conduct experiments on Image Net-1K [5] and compare our results with state-of-the-art methods. Specifically, we utilize pretrained models on Image Net-1K and fine-tune them for 30 epochs.
Hardware Specification Yes All experiments are conducted in 8 NVIDIA RTX 4090 with mixed precision.
Software Dependencies No We use standard cross-entropy and self-distillation for loss term from an unpruned model, following [14, 21, 27]. For DC tokens, we additionally utilize positional embedding terms. For extra distillation from the large teacher model, we follow [12].
Experiment Setup Yes We conduct the experiments with Adam W with learning rate 0.0001 and weight decay 2 10 5. The batch size is set to 128 per GPU. For other hyperparameters and data augmentations, we follow Dei T, except warmup epochs and stochastic depth, which we set as 0. For EMA, we set the decay factor to 0.9998. Additionally, we utilize the EMA model as distillation.