reproducibilityindex.ai

COMCAT: Towards Efficient Compression and Customization of Attention-Based Vision Models

Authors: Jinqi Xiao, Miao Yin, Yu Gong, Xiao Zang, Jian Ren, Bo Yuan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For compressing Dei Tsmall and Dei T-base models on Image Net, our proposed approach can achieve 0.45% and 0.76% higher top-1 accuracy even with fewer parameters. Our finding can also be applied to improve the customization efficiency of text-to-image diffusion models, with much faster training (up to 2.6 speedup) and lower extra storage cost (up to 1927.5 reduction) than the existing works. 4. Experiments
Researcher Affiliation	Collaboration	1Rutgers University 2Snap Inc.
Pseudocode	No	The paper includes equations, flowcharts (Figure 3, 4, 5), and descriptions of procedures, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code and models are publicly available at https: //github.com/jinqixiao/Com CAT.
Open Datasets	Yes	We first validate our approach on the Image Net-1K dataset (Deng et al., 2009) for the image classification task. The dataset includes 1.2M training images and 50K validation samples.
Dataset Splits	Yes	The dataset includes 1.2M training images and 50K validation samples.
Hardware Specification	Yes	We further measure the practical speedups of our compressed models on various computing hardware platforms, including Nvidia Tesla V100, Nvidia Jetson TX2, Android mobile phone (Snapdragon 855, 4 Cortex-A76 + 4 Cortex-A55), ASIC accelerator Eyeriss (Chen et al., 2016), and FPGA (PYNQ Z1). We train all the approaches on one Nvidia RTX A6000 GPU
Software Dependencies	No	The paper mentions using baseline models from Dei T and model weights from Hugging Face Hub, but it does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, Python version, etc.).
Experiment Setup	Yes	In the fine-tuning process, the initial learning rate is set as 0.0001 and decreases to the minimum learning rate of 0.000001 with the Cosine scheduler. The weight decay for training the compressed Dei T-small is set as 0.005. We train all the approaches on one Nvidia RTX A6000 GPU with the batch size as 1 and the number of training steps as 500. All images were generated with 50 steps of PNDM (Liu et al., 2022) sampler and guidance scale is 7.