COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models
Authors: Jinqi Xiao, Miao Yin, Yu Gong, Xiao Zang, Jian Ren, Bo Yuan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For compressing Dei Tsmall and Dei T-base models on Image Net, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters. Our finding can also be applied to improve the customization efficiency of text-to-image diffusion models, with much faster training (up to 2.6 speedup) and lower extra storage cost (up to 1927.5 reduction) than the existing works. 4. Experiments |
| Researcher Affiliation | Collaboration | 1Rutgers University 2Snap Inc. |
| Pseudocode | No | The paper includes equations, flowcharts (Figure 3, 4, 5), and descriptions of procedures, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code and models are publicly available at https: //github.com/jinqixiao/Com CAT. |
| Open Datasets | Yes | We first validate our approach on the Image Net-1K dataset (Deng et al., 2009) for the image classification task. The dataset includes 1.2M training images and 50K validation samples. |
| Dataset Splits | Yes | The dataset includes 1.2M training images and 50K validation samples. |
| Hardware Specification | Yes | We further measure the practical speedups of our compressed models on various computing hardware platforms, including Nvidia Tesla V100, Nvidia Jetson TX2, Android mobile phone (Snapdragon 855, 4 Cortex-A76 + 4 Cortex-A55), ASIC accelerator Eyeriss (Chen et al., 2016), and FPGA (PYNQ Z1). We train all the approaches on one Nvidia RTX A6000 GPU |
| Software Dependencies | No | The paper mentions using baseline models from Dei T and model weights from Hugging Face Hub, but it does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, Python version, etc.). |
| Experiment Setup | Yes | In the fine-tuning process, the initial learning rate is set as 0.0001 and decreases to the minimum learning rate of 0.000001 with the Cosine scheduler. The weight decay for training the compressed Dei T-small is set as 0.005. We train all the approaches on one Nvidia RTX A6000 GPU with the batch size as 1 and the number of training steps as 500. All images were generated with 50 steps of PNDM (Liu et al., 2022) sampler and guidance scale is 7. |