ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models

Authors: Yi-Lin Sung, Jaehong Yoon, Mohit Bansal

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our proposed method across various multimodal and unimodal models and datasets, demonstrating significant performance improvements over prevalent pruning techniques in the high-sparsity regime.
Researcher Affiliation Academia Yi-Lin Sung Jaehong Yoon Mohit Bansal Department of Computer Science, UNC Chapel Hill {ylsung, jhyoon, mbansal}@cs.unc.edu
Pseudocode Yes Algorithm 1: Efficient Coarse-to-Fine Layer-wise Pruning
Open Source Code Yes Our project page and code are available at https://ecoflap.github.io/
Open Datasets Yes We evaluate the zero-shot ability of BLIP-2 on various datasets after pruning, such as VQAv2, OK-VQA, and GQA for visual question answering, No Caps for image captioning, and Flickr30k for image-text retrieval. For BLIP, we evaluate the performance change of the BLIP fine-tuned on NLVR2 and COCO captions. For unimodal models, we evaluate Flan T5 on MMLU, evaluate LLa MA 7B with Wiki Text, and evaluate EVA-Vi T with Image Net-1k.
Dataset Splits Yes We use the validation set for No Caps and the test set for Flickr30k. In our BLIP experiments, we report results on both val and test set for NLVR2, while we use the Karpathy test split (Karpathy & Fei-Fei, 2014) for COCO captions. In our unimodal experiments, we use the publicly available test set for MMLU, and we use the validation set for Image Net-1k. For Wiki Text, we report the perplexity on the validation set.
Hardware Specification Yes All the experiments are done with one 40GB A100 or one 48GB A6000, except for ECo FLa P with first-order gradient on LLa MA we use 2x 48GB A6000.
Software Dependencies No The paper mentions that "Our codes are based on the publicly available LAVIS (Li et al., 2023b), Wanda (Sun et al., 2023a), MMLU (Hendrycks et al., 2021), UPop (Shi et al., 2023), and Co Op (Zhou et al., 2021)", but it does not specify version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes For the coarse step, we set p M = p + 0.1, |D| = 32, and the number of noises to be 1 for ECo FLa P with zeroth-order gradient. ϵ is set to 1e 3 except for LLa MA, where we find 1e 1 works better, but we almost did not tune this hyperparameter. For ECo FLa P with first-order gradient, we use |D| = 128.