Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Elastic ViTs from Pretrained Models without Retraining
Authors: Walter Simoncini, Michael Dorkenwald, Tijmen Blankevoort, Cees G. M. Snoek, Yuki Asano
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on DINO, Sig LIPv2, De IT, and Aug Reg models demonstrate superior performance over state-of-the-art methods across various sparsities, requiring less than five minutes on a single A100 GPU to generate elastic models that can be adjusted to any computational budget. Our key contributions include an efficient pruning strategy for pretrained Vision Transformers, a novel evolutionary approximation of Hessian off-diagonal structures, and a self-supervised importance scoring mechanism that maintains strong performance without requiring retraining or labels. |
| Researcher Affiliation | Collaboration | Walter Simoncini1,2 * Michael Dorkenwald2 * Tijmen Blankevoort3 Cees G.M. Snoek2 Yuki M. Asano1 1University of Technology Nuremberg 2University of Amsterdam 3NVIDIA |
| Pseudocode | Yes | The pseudocode for our algorithm is listed in Appendix D.2. Algorithm 1 outlines our single-shot pruning procedure. |
| Open Source Code | Yes | Code and pruned models are available at: https://elastic.ashita.nl/ Furthermore, we release the codebase used to run the experiments presented in this paper at https://github.com/Walter Simoncini/Snap Vi T. |
| Open Datasets | Yes | We investigate the performance of pruned models on 7 image classification datasets, namely Image Net-1k [55], FGVC Aircraft [41], Oxford-IIT Pets [49], DTD Textures [11], Euro SAT [26] and CIFAR 10/100 [33], plus Pascal VOC 2012 [17] for semantic segmentation. Table 4 lists all the datasets used in this paper alongside their license and citation. |
| Dataset Splits | Yes | We use the train/test splits defined by the dataset authors where possible, except for Euro SAT, for which we use an 80/20 stratified split as indicated by the dataset paper. We always report the performance on the test split, except for Image Net-1k and Pascal VOC, for which we report performance on the validation split. For the linear classification experiments we use the validation split defined by the dataset authors if available, and otherwise create one using an 80/20 random split. |
| Hardware Specification | Yes | The pruning experiments were run using a NVIDIA A100 GPU with 40GB of VRAM, 16 CPU cores, and 40 GB of RAM. |
| Software Dependencies | No | We evaluate pruned models in k-nearest neighbor classification using the implementation from scikit-learn [50]. |
| Experiment Setup | Yes | We prune models to six target sparsities, namely 10, 20, 30, 40, 50, and 60% in one shot. To do so, we first estimate gradients using either a DINO or a cross-entropy loss and 1000 random samples from the Image Net-1k training set (unless specified otherwise) and batch size 16. Gradients are averaged over each batch and summed across batches. We do not use any data augmentation for the cross-entropy loss, and for the DINO loss, we only use random cropping to generate 2 global and 10 local crops, with scales between (0.25, 1.0) and (0.05, 0.25), respectively. |