Unified Visual Transformer Compression
Authors: Shixing Yu, Tianlong Chen, Jiayi Shen, Huan Yuan, Jianchao Tan, Sen Yang, Ji Liu, Zhangyang Wang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments are conducted with several Vi T variants, e.g. Dei T and T2T-Vi T backbones on the Image Net dataset, and our approach consistently outperforms recent competitors. |
| Researcher Affiliation | Collaboration | 1University of Texas at Austin, 2Texas A&M University, 3Kwai Inc. |
| Pseudocode | Yes | Algorithm 1: Gradient-based algorithm to solve problem (5) for Unified Vi T Compression. Input: Resource budget Rbudget, learning rates η1, η2, η3, η4, η5, η6, number of total iterations τ. Result: Transformer pruned weights W . |
| Open Source Code | Yes | Codes are available online: https://github.com/VITA-Group/UVC. |
| Open Datasets | Yes | We conduct experiments for image classification on Image Net (Krizhevsky et al., 2012). |
| Dataset Splits | No | The paper states 'We conduct experiments for image classification on Image Net (Krizhevsky et al., 2012),' and mentions 'validation' in section '3.1 PRELIMINARY' and the JSON schema itself has a 'validation' field, but it does not provide specific details on the dataset splits (e.g., percentages or sample counts for training, validation, and testing). |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as programming languages or libraries used in the implementation. |
| Experiment Setup | Yes | Numerically, the learning rate for parameter z is always changing during the primal-dual algorithm process. Thurs, we propose to use a dynamic learning rate for the parameter z that controls the budget constraint. We use a four-step schedule of {1, 5, 9, 13, 17} in practice. |