Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Attribution-Driven Adaptive Token Pruning for Transformers

Authors: YAOYAO YAN, Hui Yu, Weizhi Xu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on GLUE, SQu AD, and 20News demonstrate that AD-TP outperforms state-of-the-art token pruning and model compression methods in both accuracy and computational efficiency.
Researcher Affiliation	Academia	Yaoyao Yan1 Hui Yu2 Weizhi Xu1 1School of Information Science and Engineering, Shandong Normal University 2Business School, Shandong Normal University EMAIL EMAIL EMAIL
Pseudocode	No	The paper describes methods using mathematical equations and textual explanations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	Answer: [No] Justification: Model weights are public. We intend to release the code pending internal review.
Open Datasets	Yes	Datasets and Evaluation Metrics. We evaluate AD-TP on 8 tasks from the GLUE benchmark and extend the evaluation to two long-text datasets: SQu AD v2.0 [31] and 20News [32].
Dataset Splits	Yes	Datasets and Evaluation Metrics. We evaluate AD-TP on 8 tasks from the GLUE benchmark and extend the evaluation to two long-text datasets: SQu AD v2.0 [31] and 20News [32]. Each GLUE task adopts task-specific evaluation metrics; SQu AD v2.0 is evaluated using the F1 score, while 20News uses accuracy. ... The teacher is trained for 5 epochs on each task, and the best validation checkpoint is used for evaluation.
Hardware Specification	Yes	All experiments are implemented in Py Torch with Huggingface Transformers on an NVIDIA RTX 3060. ... All experiments are implemented using the Py Torch framework and the Huggingface Transformers library on a single NVIDIA RTX 3060 GPU.
Software Dependencies	No	All experiments are implemented using the Py Torch framework and the Huggingface Transformers library on a single NVIDIA RTX 3060 GPU. The paper mentions software names (PyTorch, Huggingface Transformers) but does not provide specific version numbers.
Experiment Setup	Yes	Implementation Details. All experiments are implemented in Py Torch with Huggingface Transformers on an NVIDIA RTX 3060. The teacher is a 12-layer BERT-base, and the student uses either 6 or 12 layers. We tune learning rates and distillation weights (α, β, γ) across defined ranges. All hyperparameter configurations and dataset-specific settings are provided in Appendix A.3. ... Detailed task-specific configurations are reported in Table 7.