Exploring Intrinsic Dimension for Vision-Language Model Pruning
Authors: Hanzhang Wang, Jiawen Zhang, Qingyuan Ma
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically study ID variations in large-scale visionlanguage pre-trained models and examine the contributions of different modalities to model prunability. We propose a layer importance metric based on ID, which can conveniently integrate with current metrics and enhance performance in vision-language model pruning. The experimental results show a high correlation between ID and modality prunability. |
| Researcher Affiliation | Academia | 1School of Computer Engineering and Science, Shanghai University. |
| Pseudocode | Yes | Algorithm 1 Iterative Pruning with Intrinsic Dimension |
| Open Source Code | Yes | The code is available at https://github.com/Nofear18/ID_VL_Pruning |
| Open Datasets | Yes | We evaluate image captioning performance using the MSCOCO dataset (Lin et al., 2014); We evaluate the visual reasoning task using the NLVR2 dataset (Suhr et al., 2018); The Flickr30k dataset comprises over 30,000 images; We utilize the CIFAR-100 (Krizhevsky, 2009) and Image Net-1k (Russakovsky, 2015) datasets |
| Dataset Splits | Yes | MSCOCO dataset (Lin et al., 2014), encompassing 80 object and 91 stuff categories with a standard split of 118K training images and 5K images each for validation and testing |
| Hardware Specification | Yes | All of our experiments are conducted on 4 NVIDIA GeForce GTX 3090 GPUs using PyTorch. |
| Software Dependencies | No | The paper mentions 'PyTorch' and 'AdamW optimizer' but does not specify version numbers for these software components. The question requires specific version numbers for reproducibility. |
| Experiment Setup | Yes | We use a cubic pruning schedule similar to Sanh et al. (2020); Zhang et al. (2022) for the experiments in rows 1-4 of Table 8. This schedule includes initial warm-ups, ti, and final warm-ups, tf, defined as: r(0) if 0 t < ti r(T ) + r(0) r(T ) 1 t ti tf 3 if ti t < T tf r(T ) otherwise where ti = i l, tf = f l, and l is the length of the training dataloader. All experiments use the Adam W optimizer (Loshchilov & Hutter, 2018), with additional hyperparameters detailed in Table 8. |