Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploring Intrinsic Dimension for Vision-Language Model Pruning
Authors: Hanzhang Wang, Jiawen Zhang, Qingyuan Ma
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically study ID variations in large-scale visionlanguage pre-trained models and examine the contributions of different modalities to model prunability. We propose a layer importance metric based on ID, which can conveniently integrate with current metrics and enhance performance in vision-language model pruning. The experimental results show a high correlation between ID and modality prunability. |
| Researcher Affiliation | Academia | 1School of Computer Engineering and Science, Shanghai University. |
| Pseudocode | Yes | Algorithm 1 Iterative Pruning with Intrinsic Dimension |
| Open Source Code | Yes | The code is available at https://github.com/Nofear18/ID_VL_Pruning |
| Open Datasets | Yes | We evaluate image captioning performance using the MSCOCO dataset (Lin et al., 2014); We evaluate the visual reasoning task using the NLVR2 dataset (Suhr et al., 2018); The Flickr30k dataset comprises over 30,000 images; We utilize the CIFAR-100 (Krizhevsky, 2009) and Image Net-1k (Russakovsky, 2015) datasets |
| Dataset Splits | Yes | MSCOCO dataset (Lin et al., 2014), encompassing 80 object and 91 stuff categories with a standard split of 118K training images and 5K images each for validation and testing |
| Hardware Specification | Yes | All of our experiments are conducted on 4 NVIDIA GeForce GTX 3090 GPUs using PyTorch. |
| Software Dependencies | No | The paper mentions 'PyTorch' and 'AdamW optimizer' but does not specify version numbers for these software components. The question requires specific version numbers for reproducibility. |
| Experiment Setup | Yes | We use a cubic pruning schedule similar to Sanh et al. (2020); Zhang et al. (2022) for the experiments in rows 1-4 of Table 8. This schedule includes initial warm-ups, ti, and final warm-ups, tf, defined as: r(0) if 0 t < ti r(T ) + r(0) r(T ) 1 t ti tf 3 if ti t < T tf r(T ) otherwise where ti = i l, tf = f l, and l is the length of the training dataloader. All experiments use the Adam W optimizer (Loshchilov & Hutter, 2018), with additional hyperparameters detailed in Table 8. |