COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Authors: Jinqi Xiao, Miao Yin, Yu Gong, Xiao Zang, Jian Ren, Bo Yuan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For compressing Dei Tsmall and Dei T-base models on Image Net, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters. Our finding can also be applied to improve the customization efficiency of text-to-image diffusion models, with much faster training (up to 2.6 speedup) and lower extra storage cost (up to 1927.5 reduction) than the existing works. 4. Experiments
Researcher Affiliation Collaboration 1Rutgers University 2Snap Inc.
Pseudocode No The paper includes equations, flowcharts (Figure 3, 4, 5), and descriptions of procedures, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code and models are publicly available at https: //github.com/jinqixiao/Com CAT.
Open Datasets Yes We first validate our approach on the Image Net-1K dataset (Deng et al., 2009) for the image classification task. The dataset includes 1.2M training images and 50K validation samples.
Dataset Splits Yes The dataset includes 1.2M training images and 50K validation samples.
Hardware Specification Yes We further measure the practical speedups of our compressed models on various computing hardware platforms, including Nvidia Tesla V100, Nvidia Jetson TX2, Android mobile phone (Snapdragon 855, 4 Cortex-A76 + 4 Cortex-A55), ASIC accelerator Eyeriss (Chen et al., 2016), and FPGA (PYNQ Z1). We train all the approaches on one Nvidia RTX A6000 GPU
Software Dependencies No The paper mentions using baseline models from Dei T and model weights from Hugging Face Hub, but it does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, Python version, etc.).
Experiment Setup Yes In the fine-tuning process, the initial learning rate is set as 0.0001 and decreases to the minimum learning rate of 0.000001 with the Cosine scheduler. The weight decay for training the compressed Dei T-small is set as 0.005. We train all the approaches on one Nvidia RTX A6000 GPU with the batch size as 1 and the number of training steps as 500. All images were generated with 50 steps of PNDM (Liu et al., 2022) sampler and guidance scale is 7.