reproducibilityindex.ai

Exploring Intrinsic Dimension for Vision-Language Model Pruning

Authors: Hanzhang Wang, Jiawen Zhang, Qingyuan Ma

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically study ID variations in large-scale visionlanguage pre-trained models and examine the contributions of different modalities to model prunability. We propose a layer importance metric based on ID, which can conveniently integrate with current metrics and enhance performance in vision-language model pruning. The experimental results show a high correlation between ID and modality prunability.
Researcher Affiliation	Academia	1School of Computer Engineering and Science, Shanghai University.
Pseudocode	Yes	Algorithm 1 Iterative Pruning with Intrinsic Dimension
Open Source Code	Yes	The code is available at https://github.com/Nofear18/ID_VL_Pruning
Open Datasets	Yes	We evaluate image captioning performance using the MSCOCO dataset (Lin et al., 2014); We evaluate the visual reasoning task using the NLVR2 dataset (Suhr et al., 2018); The Flickr30k dataset comprises over 30,000 images; We utilize the CIFAR-100 (Krizhevsky, 2009) and Image Net-1k (Russakovsky, 2015) datasets
Dataset Splits	Yes	MSCOCO dataset (Lin et al., 2014), encompassing 80 object and 91 stuff categories with a standard split of 118K training images and 5K images each for validation and testing
Hardware Specification	Yes	All of our experiments are conducted on 4 NVIDIA GeForce GTX 3090 GPUs using PyTorch.
Software Dependencies	No	The paper mentions 'PyTorch' and 'AdamW optimizer' but does not specify version numbers for these software components. The question requires specific version numbers for reproducibility.
Experiment Setup	Yes	We use a cubic pruning schedule similar to Sanh et al. (2020); Zhang et al. (2022) for the experiments in rows 1-4 of Table 8. This schedule includes initial warm-ups, ti, and final warm-ups, tf, defined as: r(0) if 0 t < ti r(T ) + r(0) r(T ) 1 t ti tf 3 if ti t < T tf r(T ) otherwise where ti = i l, tf = f l, and l is the length of the training dataloader. All experiments use the Adam W optimizer (Loshchilov & Hutter, 2018), with additional hyperparameters detailed in Table 8.