Pruning vs Quantization: Which is Better?

Authors: Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 9 large-scale models on 4 tasks.
Researcher Affiliation Industry Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort Qualcomm AI Research Amsterdam, The Netherlands {akuzmin, markusn, mart, behboodi, tijmen}@qti.qualcomm.com
Pseudocode No The paper includes mathematical formulations and descriptions of methods, but it does not contain clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like steps.
Open Source Code Yes Code is available at https://github.com/Qualcomm-AI-research/pruning-vs-quantization
Open Datasets Yes In our experiments we used a set of 4 models trained for 4 tasks including Resnet18, Resnet50 [27], Mobile Net-V2 [58], Mobile Net-V3-small [30], Efficient Net-lite [60], and Vi T [11] trained on Image Net classification [57]; Deep Lab-V3 [7] with Mobile Net-V2 backbone trained for semantic segmentation on Pascal VOC [13]; Efficient Det [61] trained for object detection on MS COCO [43]; OPT-350 fine-tuned on Wiki Text-103.
Dataset Splits No The paper mentions using well-known datasets and fine-tuning, with 'full details on hyperparameters are given in appendix G'. However, it does not explicitly provide specific training, validation, or test dataset split percentages, absolute sample counts, or citations to predefined splits within the main text.
Hardware Specification No The paper explicitly states: 'In our comparison, we intentionally avoid considering the hardware aspects of pruning and quantization.' and 'First, our work has not extensively considered the hardware implications of pruning or quantization.' No specific hardware used for the experiments (e.g., GPU models, CPU types) is detailed.
Software Dependencies Yes In our work, we used CVX solver [21]. [...] We solve this problem using the branch-and-bound method implemented in the Gurobi solver [23] that gives the global solution. (References [21] which is 'CVX: Matlab software for disciplined convex programming, version 2.1' and [23] which is 'Gurobi Optimizer Reference Manual, 2023', indicating specific versions for these solvers).
Experiment Setup No For a fair comparison, we used the same amount of epochs of fine-tuning for each method (full details on hyperparameters are given in appendix G).