DFPC: Data flow driven pruning of coupled channels without data.
Authors: Tanay Narshana, Chaitanya Murti, Chiranjib Bhattacharyya
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the efficacy of DFPC for models trained on standard datasets. Since we pruned coupled channels, we achieve up to 1.66x improvements in inference time for Res Net-101 trained on CIFAR-10 with a 5% accuracy drop without fine-tuning. With access to the Image Net training set, we achieve significant improvements over the data-free method and see an improvement of at least 47.1% in speedup for a 2.3% accuracy drop for Res Net-50 against our baselines.1 |
| Researcher Affiliation | Collaboration | Observe.AI1 Robert Bosch Centre for Cyber-Physical Systems, Indian Institute of Science2 Department of Computer Science and Automation, Indian Institute of Science3 |
| Pseudocode | Yes | Algorithm 1 BGSC: Backwards Graph based Saliency Computation |
| Open Source Code | Yes | Our code is publicly available at https://github.com/Tanay Narshana/DFPC-Pruning |
| Open Datasets | Yes | On Res Net-101 for the CIFAR-10 dataset, we obtain a 1.66x inference time speedup for a 5% accuracy drop without retraining. On Res Net-50 for the Image Net dataset, we obtain an inference time speedup of at least 47.1% against our baselines for a 2.3% accuracy drop with retraining. |
| Dataset Splits | No | The paper describes training procedures and hyperparameters but does not explicitly detail a validation dataset split or strategy. |
| Hardware Specification | Yes | Table 3: Specifications of CPU hardware used for inference time measurements CPU Model Name AMD EPYC 7763 64-Core CPU(s) 256 Thread(s)... The GPU is an NVIDIA 1080 Ti with CUDA 10.2 and a memory of 12GB. |
| Software Dependencies | Yes | The software stack used for inferencing consisted of Python 3.9.7, Py Torch 1.10.1, and Torchvision 0.11.2. ... The GPU is an NVIDIA 1080 Ti with CUDA 10.2 and a memory of 12GB. |
| Experiment Setup | Yes | We train the models using SGD Optimizer with a momentum factor of 0.9 and weight decay of 5e-4 for 200 epochs using Cosine Annealing step sizes with an initial learning rate of 0.1. ... We prune 1% of the remaining channels in each pruning iteration followed by a finetuning of 3 epochs, each with step sizes of 10-3, 10-4, 10-5 per pruning iteration. The batch size was 256. After the pruning ends, we finally prune the network for 90 epochs with a batch size of 512. We use the SGD Optimizer with a momentum factor of 0.9 and weight decay of 1e-4 and Cosine Annealed step sizes with an initial learning rate of 0.1. |